【问题标题】:Python bs4 .find not detecting articlePython bs4 .find 未检测到文章
【发布时间】:2022-01-21 01:28:46
【问题描述】:

我正在尝试获取产品的名称,但是当它到达赞助产品时,它会返回 None。这是我的代码;

    next_page_url = 'https://www.jumia.com.ng/catalog/?q=oraimo&shipped_from=country_local&page=1#catalog-listing'
    result_nextpage = requests.get(next_page_url, headers=headers).text # headers are generated from default python 'fake_headers' module.
    doc_nextpage = BeautifulSoup(result_nextpage, 'lxml') # I also tried other parsers
    divs = doc_nextpage.find('div', class_='-paxs row _no-g _4cl-3cm-shs')
    result_articles = divs.select('h3.name')
    for i in result_articles:
        print(i.string)

结果;

Oraimo FreePods-3 2Baba Edition BT 5.2 Wireless Stereo Earbuds
Oraimo 27000mAh Massive Power Charing Bank Traveller 3 Byte
Oraimo OPB-P116DN 10000 Mah Power-Bank Dual Fast Charging
Oraimo FreePods3 True Wireless Stereo Earbuds IPX5 & Sweat Proof
Oraimo Smart Watch 1.69'' IPS Screen IP68 Waterproof
Oraimo FreePods-2 2Baba-version True Wireless Earbuds
Oraimo Silver Edition Smart Watch 1.69'' IPS Screen IP68 Waterproof
Oraimo Charger UKDualUSB OCW-U63D White
Oraimo Portable Wireless Speaker Subwoofer Outdoor Sound Box
Oraimo Charger Oraimo UKDualUSB OCW-U81F White
Oraimo Power Oraimo Bank OPB-P206DN 20KmAh
Oraimo SoundPro Wireless Speaker Muti-Model Music Play
Oraimo Tempo-W3 Smart Watch Health Monitor IP67 Waterproof
Oraimo Car Charger Oraimo OCC-21DML Black
Oraimo SoundPro-2C 10W Portable Wireless Bluetooth Speaker
Oraimo Necklace 5C Neckband Wireless Earphone
Oraimo COMPACT 10000mAh Ultra Slim Fast Charging Power Bank
Oraimo 10000mAh OPTIMIZED SLIM Power-bank With LED Light
Oraimo Mermaid Half In-ear Earphone With Mic
Oraimo Necklace 3 Lite Neckband BT 5.0 Wireless Earphone
Oraimo Senior BT5.0 Single Wireless Bluetooth Headsets
Oraimo True Wireless Bluetooth Earbuds- Freepods 2
Oraimo FreePods-2 2Baba-version True Wireless Earbuds
Oraimo Air-Buds-2S Super Bass Wireless Stereo Earbuds
Oraimo 20000MAH Powerbank -long Lasting PowerBank
Oraimo Bluetooth Wireless SOUNDBAR SPEAKER
Oraimo Shark-2 BT5.0 In-Ear Wireless Bluetooth Headphones
Oraimo BoomPop Over-Ear Bluetooth Wireless Headphone
Oraimo  20000MAH Powerbank -long Lasting Power For Days
Oraimo FreePods-2 2Baba-Version  True Wireless Stereo Earbud
Oraimo 2021 Latest Edition Smart Function Waterproof Smart Watch
Oraimo OCW-U36S Efficient And Durable USB Charger - Black
Oraimo FreePods-2 2Baba-Version  True Wireless Stereo Earbud-white
Oraimo 10000MAh Ultimate Slim Power Bank - Black
Oraimo 20000MAH Powerbank - Power For Days
Oraimo 10000mAh Ultra Slim Fast Charging Power Bank
Oraimo 2020 Edition Tempo S - OSW-11 Multi Function Smart Watch
Oraimo SOLID 27000mAh Massive Powerbank OPB-P271D Traveller 3 Byte
Oraimo FreePods-3 2Baba Edition BT 5.2 Wireless Stereo Earbuds
Oraimo Tempo-S IP67 Waterproof Smart Watch WITH AMAZING FUNCTIONS
None
None
None
None
None
None
None
None

文章标签 41-48 是赞助产品,产品名称从浏览器的检查元素中显示,但 bs4 没有检测到它,但它检测到其他非赞助产品。 请帮忙。

【问题讨论】:

    标签: python web-scraping beautifulsoup request data-analysis


    【解决方案1】:

    注意 首先,看看你的汤/doc_nextpage - 这是你处理数据的真相。

    会发生什么?

    在您的 doc_nextpage 中,您的赞助产品的 html 是空的,这就是您获得这些 None 的原因。

    它们是空的,因为它们将由网站动态提供, 无法处理。它不是浏览器,它会解释/操作数据。

    如何解决?

    一种选择是使用selenium 模拟浏览器行为,并让page_source 使用 本身对其进行处理。

    示例(硒 4)

    from bs4 import BeautifulSoup 
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.chrome.service import Service as ChromeService
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions()
    service = ChromeService(executable_path='ENTER YOUR PATH TO CHROMEDRIVER')
    driver = webdriver.Chrome(service=service, options=options)
    driver.get('https://www.jumia.com.ng/catalog/?q=oraimo&shipped_from=country_local&page=1#catalog-listing')
    
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-list="sponsored"]')))
    
    soup = BeautifulSoup(driver.page_source, 'lxml')
    
    print([x.text for x in soup.select('article h3.name')])
    
    driver.close()
    

    输出

    ['Oraimo FreePods-3 2Baba Edition BT 5.2 Wireless Stereo Earbuds',
     'Oraimo 27000mAh Massive Power Charing Bank Traveller 3 Byte',
     'Oraimo OPB-P116DN 10000 Mah Power-Bank Dual Fast Charging',
     'Oraimo FreePods3 True Wireless Stereo Earbuds IPX5 & Sweat Proof',
     "Oraimo Smart Watch 1.69'' IPS Screen IP68 Waterproof",
     'Oraimo FreePods-2 2Baba-version True Wireless Earbuds',
     "Oraimo Silver Edition Smart Watch 1.69'' IPS Screen IP68 Waterproof",
     'Oraimo Charger UKDualUSB OCW-U63D White',
     'Oraimo Portable Wireless Speaker Subwoofer Outdoor Sound Box',
     'Oraimo Charger Oraimo UKDualUSB OCW-U81F White',
     'Oraimo Portable Source 10000mAh Po Wer Ba Nk Oraimo OPB-P110D',
     'Oraimo Power Oraimo Bank OPB-P206DN 20KmAh',
     'Oraimo SoundPro Wireless Speaker Muti-Model Music Play',
     'Oraimo Tempo-W3 Smart Watch Health Monitor IP67 Waterproof',
     'Oraimo Car Charger Oraimo OCC-21DML Black',
     'Oraimo SoundPro-2C 10W Portable Wireless Bluetooth Speaker',
     'Oraimo Necklace 5C Neckband Wireless Earphone',
     'Oraimo 10000mAh OPTIMIZED SLIM Power-bank With LED Light',
     'Oraimo COMPACT 10000mAh Ultra Slim Power Fast Charging Bank',
     'Oraimo Mermaid Half In-ear Earphone With Mic',
     'Oraimo Necklace 3 Lite Neckband BT 5.0 Wireless Earphone',
     'Oraimo Senior BT5.0 Single Wireless Bluetooth Headsets',
     'Oraimo True Wireless Bluetooth Earbuds- Freepods 2',
     'Oraimo Pilot 20000mAh 2.1A Fast  Power Charging Bank',
     'Oraimo FreePods-2 2Baba-version True Wireless Earbuds',
     'Oraimo Air-Buds-2S Super Bass Wireless Stereo Earbuds',
     'Oraimo 20000MAH Powerbank -long Lasting PowerBank',
     'Oraimo Bluetooth Wireless SOUNDBAR SPEAKER',
     'Oraimo Shark-2 BT5.0 In-Ear Wireless Bluetooth Headphones',
     'Oraimo BoomPop Over-Ear Bluetooth Wireless Headphone',
     'Oraimo  20000MAH Powerbank -long Lasting Power For Days',
     'Oraimo FreePods-2 2Baba-Version  True Wireless Stereo Earbud',
     'Oraimo OCW-U36S Efficient And Durable USB Charger - Black',
     'Oraimo 2021 Latest Edition Smart Function Waterproof Smart Watch',
     'Oraimo OCW-U36S Efficient And Durable USB Charger - Black',
     'Oraimo Firefly-2 5.0V/2.1A Dual USB Fast Wall Charger',
     'Oraimo FreePods-2 2Baba-Version  True Wireless Stereo Earbud-white',
     'Oraimo 10000MAh Ultimate Slim Power Bank - Black',
     'Oraimo SOLID 27000mAh Massive Powerbank OPB-P271D Traveller 3 Byte',
     'Oraimo FreePods-3 2Baba Edition BT 5.2 Wireless Stereo Earbuds',
     'Oraimo Massive 27000mAh Travellers 3 Byte OPB-P271D Power Bank',
     'Oraimo 1.69" IPS Screen IP68 Waterproof Smart Watch Pro-Silver',
     'Oraimo Tempo-S IP67 Waterproof Smart Watch WITH AMAZING FUNCTIONS',
     'Oraimo FreePods-3 E104D 2Baba Edition BT 5.2 Wireless Earbuds',
     'Oraimo Tempo-S IP67 Waterproof Smart Watch',
     'Oraimo 2020 Edition Tempo S - OSW-11 Multi Function Smart Watch',
     'Oraimo 10000mAh Ultra Slim Fast Charging Power Bank',
     'Oraimo 20000MAH Powerbank - Power For Days']
    

    【讨论】:

    • 哇哦,谢谢@HedgeHog 请问我现在如何获取数据,我知道它是动态提供的?
    • 添加了一个示例解决方案 - 如果此答案或任何其他答案解决了您的问题,请将其标记为已接受 - someone-answers 谢谢。 每个问题应该只针对一个问题,a new question should be asked 对应每个附加问题。
    • 谢谢,即使在导入from selenium.webdriver.chrome.service import Service as ChromeService 之后我也收到此错误driver = webdriver.Chrome(service=service, options=options) TypeError: WebDriver.__init__() got an unexpected keyword argument 'service'
    • 你检查过你的 selenium 版本吗?我提到它适用于 selenium 4,看来您使用的是该服务不可用的旧版本。
    • 它有效。我注意到这行代码WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//a[@data-list="sponsored"]'))) 显示了赞助产品。我知道这行代码是为了显式等待一个元素出现,但是当我在文章标签 WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH,'//div[@class="-paxs row _no-g _4cl-3cm-shs"]'))) 的父级上尝试相同的等待时,它不起作用。请您解释一下发生了什么?
    猜你喜欢
    • 2021-04-27
    • 2021-12-30
    • 1970-01-01
    • 1970-01-01
    • 2021-05-04
    • 2020-08-13
    • 2020-08-09
    • 2018-11-05
    • 1970-01-01
    相关资源
    最近更新 更多