【问题标题】:Problem with getting results with BeautifulSoup4使用 BeautifulSoup4 获得结果的问题
【发布时间】:2021-02-19 20:07:49
【问题描述】:

我正在学习网络解析,并使用 BS4 创建了一个 python 脚本。 当我尝试运行此脚本时,我只获得 1 项的输入。

import requests
from bs4 import BeautifulSoup as BS
url = 'https://www.gumtree.com.au/s-sydney/computer/k0l3003435?price-type=free'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0)'}
response = requests.get(url, headers=headers)
response.content
soup = BS(response.content, 'html.parser')
items_list = soup.find_all('section', {'class': 'search-results-page__user-ad-collection'})
for items in items_list:
    title = items.find('span', {'class': 'user-ad-row-new-design__title-span'}).text
    url_tag = items.find('a', {'href': 'user-ad-row-new-design.link--base-color-inherit.link--hover-color-none.link--no-underline'})
    url = url_tag.text if url_tag else "n/a"
print('item:', title, '\nlink:', url)

由于某种原因,我只得到了一项的结果?

item: PC Joysticks 
link: n/a

谁能帮帮我。

注意:这是我第一次在这里发帖,提前致歉。

【问题讨论】:

    标签: python windows beautifulsoup automation html-parsing


    【解决方案1】:

    问题是:您的父根 'section', {'class': 'search-results-page__user-ad-collection'}),您请求的只是一项 -> 这就是为什么您只打印 (title, url) 的一个元组

    您可以找到与每个广告对应的<div>s 列表,然后搜索每个广告的标题和网址。

    以下是我所做的有效更改:

    import requests
    from bs4 import BeautifulSoup as BS
    
    
    url = 'https://www.gumtree.com.au/s-sydney/computer/k0l3003435?price-type=free'
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0)'}
    response = requests.get(url, headers=headers)
    # response.content
    # print(response.content)
    soup = BS(response.content, 'html.parser')
    # items_list = soup.find_all('section', {'class': 'search-results-page__user-ad-collection'})
    items_list = soup.find_all('a', attrs={'class': 'user-ad-row-new-design link link--base-color-inherit link--hover-color-none link--no-underline'}, href=True)
    for items in items_list:
        title = items.find('span', {'class': 'user-ad-row-new-design__title-span'}).text
        url = items['href']
        print('item:', title, '\nlink:', url)
    

    输出:

    item: Computer chair 
    link: /s-ad/minchinbury/office-chairs/computer-chair/1260894371
    item: Free computer table workstation 
    link: /s-ad/haberfield/desks/free-computer-table-workstation/1260881353
    item: PC Joysticks 
    link: /s-ad/old-toongabbie/computer-accessories/pc-joysticks/1260849652
    item: computer desk 
    link: /s-ad/menai/desks/computer-desk/1260826750
    item: ? ONLINE  Tutor in Maths, Science and Engineering ASSIGNMENT 
    link: /s-ad/the-rocks/other-books-music-games/-online-tutor-in-maths-science-and-engineering-assignment/1260785917
    item: Childrens' soft toys, family board games, assorted art, clothes 
    link: /s-ad/avalon/toys-indoor/childrens-soft-toys-family-board-games-assorted-art-clothes/1260750520
    item: Free computer chair. 
    link: /s-ad/liverpool/computer-accessories/free-computer-chair-/1260749656
    item: Wanted Free DEAD OR ALIVE Computers all types for personal training. 
    link: /s-ad/north-richmond/other-electronics-computers/wanted-free-dead-or-alive-computers-all-types-for-personal-training-/1260659962
    item: DVI-D Video Cable 
    link: /s-ad/stanmore/computer-accessories/dvi-d-video-cable/1260657663
    item: Computer games and controller 
    link: /s-ad/kensington/video-games/computer-games-and-controller/1260568719
    item: Two garage sales, Some of them FREE to go 
    link: /s-ad/eastwood/sofas/two-garage-sales-some-of-them-free-to-go/1260562401
    item: Antec case free 
    link: /s-ad/ermington/components/antec-case-free/1260451058
    item: Free computer chair 
    link: /s-ad/caringbah/office-chairs/free-computer-chair/1260411306
    item: Computer tables 
    link: /s-ad/werrington/other-electronics-computers/computer-tables/1260373820
    item: Large desk 750cms depth, 150cms wide, 74cms high, two drawers, beige 
    link: /s-ad/marrickville/desks/large-desk-750cms-depth-150cms-wide-74cms-high-two-drawers-beige/1260365079
    item: Computer books 
    link: /s-ad/belmore/textbooks/computer-books/1260345045
    item: Windows XP computer games 
    link: /s-ad/east-kurrajong/software/windows-xp-computer-games/1260332226
    item: Computer office chair 
    link: /s-ad/fairfield/office-chairs/computer-office-chair/1260286224
    item: i want non working/ old computers 
    link: /s-ad/sydney-region/desktops/i-want-non-working-old-computers/1260194032
    item: accountant books 
    link: /s-ad/epping/other-books-music-games/accountant-books/1260156161
    item: FREE    FREE - SAVE FROM LANDFILL - Office cabinet -has slight damage 
    link: /s-ad/wetherill-park/cabinets/free-free-save-from-landfill-office-cabinet-has-slight-damage/1259887356
    item: Desktop computer desk 
    link: /s-ad/miranda/desks/desktop-computer-desk/1259807032
    item: Harwood desk, computer table or study desk 
    link: /s-ad/rosebery/desks/harwood-desk-computer-table-or-study-desk/1259782448
    item: HP Deskjet D2360 Free 
    link: /s-ad/neutral-bay/printers-scanners/hp-deskjet-d2360-free/1259631311
    

    【讨论】:

      【解决方案2】:

      看起来像缩进的问题。 在 for 循环中推送最后一行打印:

      for items in items_list:
          title = items.find('span', {'class': 'user-ad-row-new-design__title-span'}).text
          url_tag = items.find('a', {'href': 'user-ad-row-new-design.link--base-color-inherit.link--hover-color-none.link--no-underline'})
          url = url_tag.text if url_tag else "n/a"
          print('item:', title, '\nlink:', url)
      

      【讨论】:

      • 您好,感谢您的回复。我尝试了你的建议,它仍然给我同样的结果。
      • @JagjotSingh 在您获取父元素的方式中,您的代码存在一些问题。我添加了一个返回预期响应的新答案,请查看
      【解决方案3】:

      这是否回答了您的问题?我没有美化输出

      import requests
      from bs4 import BeautifulSoup as BS
      url = 'https://www.gumtree.com.au/s-sydney/computer/k0l3003435?price-type=free'
      headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0)'}
      response = requests.get(url, headers=headers)
      soup = BS(response.content, 'html.parser')
      items_list = soup.find_all('section', {'class': 'search-results-page__user-ad-collection'})
      for item in items_list:
          print(item.text)
      for item in items_list:
          url_tag = item.find('a', {'href': 'user-ad-row-new-design.link--base-color-inherit.link--hover-color-none.link--no-underline'})
          url = url_tag.text if url_tag else "n/a"
          print('item:\n', item.text, '\nlink:', url)
      

      输出:-

      Dell latitude 5510 laptop computerI am selling my laptop for $275. If you buy it, another identical laptop is FREE. Dual boot. Both computers have Windows 7 Professional and Windows 10 Pro, and you can choose the one you want to run. Intel Core i5 520M CPU (2.4GHz, 3M cache) (Dual Core) 4GB RAM 15.6” HD Anti-Glare LED Display DVD-ROM, DVD-RW, CD/DVD writer/recorder AC Adapter (fast charging technology) 4-cell battery 64 bit 4 usb ports 160 GB harddisk Office 2010 No webcam Windows 7 is activated with genuine key. IFreePrestons, NSW2 hours agoPC JoysticksHave two sets of old computer joysticks and throttles. These will require a "USB to game port" adapter to be purchased to work. Here is a suggested link to check (https://www.sbtech.com.au/usb-to-game-port-adaptor/) 
      No guarantees that they work at all and as such they are offered as a lot for free. 
      For pickup from Constitution Hill (near Parramatta NSW).FreeOld Toongabbie, NSW4 hours agoFree metal desk computer workstationFree desk about 2.5metres long. Pick up from FivedockFreeHaberfield, NSW4 hours agocomputer deskUsed computer desk with chair in reasonable condition. 
      Desk is a little bowed and chair a little knocked about but still perfectly fine. 
      Pick up only from Sutherland Shire areaFreeMenai, NSW7 hours ago? ONLINE  Tutor in Maths, Science and Engineering ASSIGNMENTOnline Tutor in Accounting, Algebra, Anatomy, Biochemistry, Biology, Business Studies, Calculus, Chemistry, Computing, Economics, Engineer, Engineering, Essay Writing, Finance, Further Maths, General Science, Geometry, Health Studies, Human Biology, Management, Maths, Maths Methods, Microbiology, Microsoft Office, Nutrition, Physical Education, Physics, Physiology, Programming, Statistics, UMAT. 
      Whether you need to catch up with class, have difficulty with exams looming or just want the extrFreeThe Rocks, NSW21 hours agoChildrens' soft toys, family board games, assorted art, clothesA collection of soft toys (all clean) and family board games, free 
      Framed art and prints, canvas botanical prints, free... 
      

      【讨论】:

        猜你喜欢
        • 2020-11-09
        • 2011-09-21
        • 2021-11-09
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多