【问题标题】:Web Scraping Ebay Using Python and BeautifulSoup使用 Python 和 BeautifulSoup 抓取 Ebay 网页
【发布时间】:2019-10-16 01:04:33
【问题描述】:

我的 eBay 网络爬虫最近开始给我奇怪的结果,在检查页面后,我认为 eBay 更改了一些 HTML。我想列出标题和价格。在这个具体示例中,我感兴趣的是 Playstation Station 1 的价格。

这是我的一些代码,我删除了我知道正在工作的部分。我试过.findall().select() 并且我尝试搜索 <div><a><li>。我的一些实验看起来很有希望,但我无法将它们中的任何一个变成我真正想要的。

#BaseURL, Syntax1 and Syntax2 should be standard across all
#Ebay URLs, whereas Request and PageNumber can change 

BaseURL = "https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw="

#Syntax1 = "&_skc=50&rt=nc"

Request = "Playstation+1"

Syntax1  = "&_sacat=0&_pgn="

PageNumber ="2"

URL = BaseURL + Request + Syntax1 + PageNumber 

Row = 0

Res=requests.get(URL)
soup=BeautifulSoup(Res.text,'html.parser')


for post in soup.find_all("#s-item__wrapper.clearfix a"):

     print(post)



for post in soup.find_all("#clearfix a"):
     print(post)

for post in soup.find_all("#srp-results li"):
     print(post)

for post in soup.select("#s-item__info.clearfix a"):
     print(post)


for post in soup.select("#s-item__wrapper.clearfix a"):
     print(post)



for post in soup.select("#clearfix a"):
     print(post)

for post in soup.select("#srp-results li"):
     print(post)

for post in soup.select("#s-item__info.clearfix a"):
     print(post)

for post in soup.findAll('a',{'class':'s-item_info.clearfix'}):
    print (post)

for post in soup.findAll('a',{'class':'s-item_info.clearfix'}):
    print (post)

for post in soup.findAll('span',{'class':'s-item__subtitle'}):
    print (post)



for post in soup.select("a"):
      print (post)


for post in soup.select("span.fee"):
       print (post)


for post in soup.select("h3.lvtitle"):
      print (post)     


for post in soup.select("li"):
        print (post)   

for post in soup.select("h3.s-item__title"):
        print (post)    

这段代码

    for post in soup.select("h3.lvtitle"):
              print (post)    

给了我这个回复

<h3 class="lvtitle"><a class="vip" href="https://www.ebay.co.uk/itm/Sony-Playstation-1-PSOne-SCPH-102-Main-PC-Board-Motherboard-Working/123842167694?epid=144184002&amp;_trkparms=ispr%3D1&amp;hash=item1cd591838e:g:bf8AAOSwJh1dMf9s&amp;enc=AQAEAAACQBPxNw%2BVj6nta7CKEs3N0qXiE5Y2jlmIfKtr%2Bxi232c57OIyrwS79xif%2FlKrPVXZAFCDQ2S71uUAjUZu8lA246CIFP9YHWpAmpdH6f%2FR4Fhpr0%2B04Wwe9eZsf52saA0HbEKTkQaAhpsd%2BN%2F%2FEeeZBhHnA%2FMA78980TOcJDziAJpzcIqM4tqeU2aSvpT35gmJnYot%2FEi0BUjzBNfZRfbIH3cGIOQkrDmI4noPWLkmVYc7xuE%2FKZV2xhm2r9jHY3VWhXcd3WBwWI4n3o6YUXuSgFFofb6ClW3%2FVgtxIgkxlnMLnvQrb3HFE8FTzmEuzhphJ6j1nDWrB8p4w%2FV0jIYSYiMxyQ6QElTMPHxXDrAuxY2%2Fpwi7wM8heimg5Evr8cx4Aeoa3SXm%2B9uaJGpTdoWwkr7B39cmNLkAn9G9MeoClEky2yRc6GLEfgtNxt4Pb79Qq8S6mybfBqzekMkXTz5GdiXBLaQsesvntzKDej%2FIwlBcwM%2B59gk44pFKbHvx7BUlOVdvzx3Ck%2BUk0ijGX0VWFl0Qh1OuoY6Y1l1iAcT6TvV65TlyNhdvUwsgwPTAEi6WKJ%2BGbnGMa%2FAftHwdDqjo4Xj15n6HrHDjgfk0%2Br%2B9S%2FqRxcmLEHXe%2FiXWzVwO4bL7oMsoD8pPLSWf%2F7cmya2LUjGd2ycb9h6bp9WxuuYdHMT%2BeWcKY2qmr2oCgIiMbW0NlOmH5nglgwsNfJMcWGMqOnEZqz9IRVO0574S0QlQJ90QLd2nbTuHQpyl0rXnl8gCNA%3D%3D&amp;checksum=123842167694fd5f2bf4d71145be83a834905bc49a96" title="Click this link to access Sony Playstation 1 PSOne SCPH-102 Main PC Board Motherboard Working">Sony Playstation 1 PSOne SCPH-102 Main PC Board Motherboard Working</a>
</h3>

我只需要这部分“title="Click this link to access Sony Playstation 1 PSOne SCPH-102 Main PC Board Motherboard Working"&gt;Sony Playstation 1 PSOne SCPH-102 Main PC Board Motherboard Working”。

当我使用代码时

for post in soup.select("h3.lvtitle"):
    print (post.find('a')['title']) 

我得到了以下结果。这是正确的信息,但您在屏幕上看到的项目顺序与下面的列表不同。我不明白为什么它不是按时间顺序排列的。

Click this link to access PS1 Sony Playstation Psone Slim Console Only with TV Cable (SCPH-102 PAL)
Click this link to access SONY PLAYSTATION 1 PS1 CONSOLE  Tested Working,  Controller & leads
Click this link to access Sony Playstation 1 PS1 PSONE Console Bundle  & Controller Used Prompt Free Post
Click this link to access PS1 Sony Playstation Psone Slim Console ~ boxed ~ SCPH-102 ~ mancave ~ christmas
Click this link to access Sony Playstation 1 (PS1) Console Bundle - Controller, 3 Games (Tomb Raider)
Click this link to access Sony Playstation 1 Grey Console+all leads and 2 controller's
Click this link to access Slim Playstation 1 Console And Games
Click this link to access Sony PS1 Slim Playstation 1 PAL Console + Crash Bandicoot Game - SCPH-102
Click this link to access Playstation 1 Console with controller and 1Mb memory card - fully working
Click this link to access Playstation 1 console & games bundle
Click this link to access Sony playstation 1 original boxed
Click this link to access Sony Playstation 1 SCPH-102 Console - Grey
Click this link to access playstation 1 games bundle, Tomb Raider 2,3, Moho, Spider-Man 1,2, Star Wars
Click this link to access PS1 CHIPPED Console ONLY Sony Playstation 1 TESTED & WORKING
Click this link to access 128G PS1 MINI True Blue Mini Crackhead Pack For Playstation Built-in 7000Games .
Click this link to access Sony Playstation Original PS1 Console (2x Controllers, AV Cable, AC Adapter)
Click this link to access Sony Playstation 1 PS1 slim console and games bundle
Click this link to access PlayStation 1 Console Bundle (SCPH-1002 Audiophile) + 3 Games - Tested & Working
Click this link to access PLAYSTATION 1 CONSOLE And 6 GAMES BUNDLE
Click this link to access Sony PlayStation 1 Dual Shock Bundle Grey Console scph 5552
Click this link to access Sony PlayStation 1 PS1 Console Bundle - 18 Games -  Controller - All Cables
Click this link to access Playstation 1 Slim
Click this link to access PS1 - Sony PlayStation Classic Mini Console Boxed (modded)
Click this link to access PS1 PSONE Sony Playstation 1 Slim Console - Tested & Working - Inc Memory Card
Click this link to access 128G PS1 MINI True Blue Mini Crackhead Pack For Playstation Built-in 7000Games `
Click this link to access Sony PlayStation Classic Mini Console with 20 GAMES PS1 (New and Sealed) PS One
Click this link to access Sony PlayStation 1 PS1 Original Console Only SCPH-9002 PAL Tested And Working
Click this link to access PlayStation 1 - (SCPH-7502) + Xplorer cheat cartridge + controller + RFU adaptor
Click this link to access SONY PLAYSTATION PS1 CONSOLE COMPLETE SETUP *TESTED AND WORKING*
Click this link to access Sony Playstation 1 original console
Click this link to access Playstation - PS1 Console - Region Free/Multi Region PAL, NTSC-J, NTSC-U/C
Click this link to access Sony PSP Playstation Portable, 7 games, 1 video, manual, power & USB leads, case
Click this link to access PS1 CONSOLE & LOADS OF GAMES  - Original PlayStation - Tested! / PLAYSTATION 1
Click this link to access Sony PlayStation 1 ps1 with gamez Dual Shock Bundle Grey Console
Click this link to access Retro Sony Playstation 1 Console, Carry Case, Games & Leads
Click this link to access Sony PlayStation 1 PS1 Original Console + 3 Games Rayman Crash Banicoot Oddworld
Click this link to access SONY Playstation 1 PS1 PSone Home Games Console Bundle With Controller - B98
Click this link to access Boxed Sony Ps1 With 1 Controller - SCPH-5552B ( PlayStation 1) Fully Working
Click this link to access Sony Playstation 1 PS1 Silver Console with 2 Official Controller
Click this link to access SONY PlayStation 4 Pro with FIFA 20 - 1 TB
Click this link to access Playstation 1 Console + 3 Games
Click this link to access Sony Playstation 1 Console PS1 with 3 Games
Click this link to access Sony Playstation 1 PS1 Grey Console FREE POSTAGE
Click this link to access playstation 1 bundle
Click this link to access Sony Playstation 1 SCPH-5502 Grey Console. Orginal playstation with controller
Click this link to access PlayStation 1 PS1 SCPH-1002 Audiophile PAL Grey Console / Tested Working
Click this link to access Sony PlayStation PS1 Dual Shock Bundle Grey Console 8 Games SCPH-9002 PAL
Click this link to access Sony Playstation 1 PS1 Console Bundle 10 games
Click this link to access SONY Playstation PS1 Console Boxed - Good Condition  SCPH-5552
Click this link to access Ps1 Console + Memory Card / Choose Slim / Phat / Audiophile - Complete Setup
Click this link to access Sony Playstation 1 PSone PS1. Working but does not include cables.Good Condition
Click this link to access Sony Playstation 2 PAL (SCPH-39003) Mem.Card 1 Controller + 20 Games
Click this link to access Sony PlayStation 3 Slim  (CECH-2503B) 1 Controller + 5 Games
Click this link to access PS1 Sony Playstation Psone Slim Console (SCPH-102 PAL) bundle leads & controller
Click this link to access Sony PS1 Playstation 1 Console Grey (With Box, Controllers, Games & Cables) PAL
Click this link to access PlayStation 1 PSOne White Console
Click this link to access SONY PLAYSTATION PS1 CONSOLE COMPLETE SETUP *TESTED AND WORKING*
Click this link to access Playstation1 Bundle
Click this link to access Sony Playstation PS One PS1 White Console + 2 Controllers & Memory Card - TESTED
Click this link to access PS1 Sony Playstation 1 PS1 Console - Bundle Joblot INC MEMORY CARD - FREE UK P&P
Click this link to access 128G PS1 MINI True Blue Mini Crackhead Pack for Playstation Built-in 7000 Games

【问题讨论】:

    标签: python web-scraping beautifulsoup ebay-api findall


    【解决方案1】:

    使用正则表达式:

    正则表达式:

    title=".*?"
    

    Python 代码:

    import re
    
    pattern =r'title=".*?"'
    text ='''
    <h3 class="lvtitle"><a class="vip" href="https://www.ebay.co.uk/itm/Sony-Playstation-1-PSOne-SCPH-102-Main-PC-Board-Motherboard-Working/123842167694?epid=144184002&amp;_trkparms=ispr%3D1&amp;hash=item1cd591838e:g:bf8AAOSwJh1dMf9s&amp;enc=AQAEAAACQBPxNw%2BVj6nta7CKEs3N0qXiE5Y2jlmIfKtr%2Bxi232c57OIyrwS79xif%2FlKrPVXZAFCDQ2S71uUAjUZu8lA246CIFP9YHWpAmpdH6f%2FR4Fhpr0%2B04Wwe9eZsf52saA0HbEKTkQaAhpsd%2BN%2F%2FEeeZBhHnA%2FMA78980TOcJDziAJpzcIqM4tqeU2aSvpT35gmJnYot%2FEi0BUjzBNfZRfbIH3cGIOQkrDmI4noPWLkmVYc7xuE%2FKZV2xhm2r9jHY3VWhXcd3WBwWI4n3o6YUXuSgFFofb6ClW3%2FVgtxIgkxlnMLnvQrb3HFE8FTzmEuzhphJ6j1nDWrB8p4w%2FV0jIYSYiMxyQ6QElTMPHxXDrAuxY2%2Fpwi7wM8heimg5Evr8cx4Aeoa3SXm%2B9uaJGpTdoWwkr7B39cmNLkAn9G9MeoClEky2yRc6GLEfgtNxt4Pb79Qq8S6mybfBqzekMkXTz5GdiXBLaQsesvntzKDej%2FIwlBcwM%2B59gk44pFKbHvx7BUlOVdvzx3Ck%2BUk0ijGX0VWFl0Qh1OuoY6Y1l1iAcT6TvV65TlyNhdvUwsgwPTAEi6WKJ%2BGbnGMa%2FAftHwdDqjo4Xj15n6HrHDjgfk0%2Br%2B9S%2FqRxcmLEHXe%2FiXWzVwO4bL7oMsoD8pPLSWf%2F7cmya2LUjGd2ycb9h6bp9WxuuYdHMT%2BeWcKY2qmr2oCgIiMbW0NlOmH5nglgwsNfJMcWGMqOnEZqz9IRVO0574S0QlQJ90QLd2nbTuHQpyl0rXnl8gCNA%3D%3D&amp;checksum=123842167694fd5f2bf4d71145be83a834905bc49a96" title="Click this link to access Sony Playstation 1 PSOne SCPH-102 Main PC Board Motherboard Working">Sony Playstation 1 PSOne SCPH-102 Main PC Board Motherboard Working</a>
    </h3>
    '''
    print(re.search(pattern,text).group(0))
    

    使用 bs4

    for post in soup.select("h3.lvtitle"):
        print (post.find('a')['title']) 
    

    【讨论】:

    • 我收到此错误文件“C:\Users\HP\Anaconda3\lib\site-packages\bs4\element.py”,第 1016 行,在 getitem 中返回 self .attrs[key] KeyError: 'title'
    • 嘿,请在我原来的问题底部查看我的新更新。
    猜你喜欢
    • 1970-01-01
    • 2014-08-16
    • 1970-01-01
    • 2016-06-19
    • 2020-08-09
    • 1970-01-01
    • 2018-04-25
    • 2014-06-20
    • 1970-01-01
    相关资源
    最近更新 更多