Beautiful Soup，抓取拍卖网站，在拍卖完成后清除已售物品 div答案

【问题标题】：Beautiful Soup, scraping an auction site that clears the sold item div on completion of auctionBeautiful Soup，抓取拍卖网站，在拍卖完成后清除已售物品 div
【发布时间】：2020-02-18 22:10:19
【问题描述】：

我正在构建的刮刀遇到问题。我想从拍卖网站上抓取信息。问题是当我请求 html 时，我必须刷新页面，当您刷新此拍卖网站页面时，任何已结束的拍卖物品都会被删除，因此我丢失了我想要获取的数据。

def soldDetection(soup):
    #Select the timer and check if it == "Lot Closed" once lot is closed send the entire container information to getData() to extract price and details
    while True:
        getPage(url)
        container = soup.find_all('li', class_='current-price')
        #print(container)
        for child in container:
            label = child.span.contents[0]
            #print (label)
            if label == 'Closing bid':
                # grab entire div for the card with data for the getData()
                print('Found')
                parentDiv = label.find_parent('div', class_='lot-single')
                getData(parentDiv)
                return parentDiv
            else:
                continue
        time.sleep(1)
        print('Nothing Sold')
        continue

在我的浏览器中，如果我不刷新“当前价格”范围，内容会从“当前出价”更改为“收盘价”，但如果我刷新该项目，则会从 html 中清除。有没有办法在不提神和清除的情况下获得漂亮的汤来观看这个？我担心 bs4 可能不适合这项工作，如果是，我应该使用什么工具？

谢谢，

【问题讨论】：

标签： web-scraping beautifulsoup

【解决方案1】：

我找到了一个解决方案，我使用 selenium 打开浏览器并观察 javascript 发送到页面的更改。然后收集所有的 html 并将其发送到我美丽的汤函数以导航树。

from selenium import webdriver
browser = webdriver.Firefox()
browser.get(websiteAddress)
...
      elems = browser.find_elements_by_xpath("//*[contains(text(),'Closing bid')]")
        if not elems:
            print('Not Found')
            continue
        label = elems[0].text
        if label == 'Closing bid':
            # grab entire div for the card with data for the getData()
            soup = bs4.BeautifulSoup(browser.page_source,'html.parser')

【讨论】：