【问题标题】:How can I remove the p tag of the data being grabbed如何删除被抓取数据的 p 标签
【发布时间】:2021-11-03 03:46:53
【问题描述】:

我正在尝试使用 sn-p 来获取 div 中的 p。运行脚本时,输出包括其所有格式标记。

import requests
from bs4 import BeautifulSoup
import time
from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
url = 'https://poocoin.app/rugcheck/0xf09b7b6ba6dab7cccc3ae477a174b164c39f4c66/dev-activity'
driver.get(url)

time.sleep(8)
soup = BeautifulSoup(driver.page_source, 'lxml')
pdata = soup.find_all('div',attrs={"class":"mt-2"})
for x in pdata:
    print (x.find('p'))
driver.quit()

电流输出:

<p><a href="/tokens/0xf09b7b6ba6dab7cccc3ae477a174b164c39f4c66">Go to chart</a></p>
<p>This is a log of activity related to the token from all wallets that have had ownership of the contract.</p>
<p>Wallet activity for <a href="https://bscscan.com/address/0x410e372657e088d5b7db76346cd958b1b642b984" rel="noreferrer" target="_blank">0x410e372657e088d5b7db76346cd958b1b642b984</a><br/><span class="text-muted text-small">(Ownership transferred to <a href="https://bscscan.com/address/0x0000000000000000000000000000000000000000" rel="noreferrer" target="_blank">0x0000000000000000000000000000000000000000</a> on 4/17/2021, 4:59:30 AM)</span></p>

想要的输出:

0xf09b7b6ba6dab7cccc3ae477a174b164c39f4c66
Wallet activity for 0x410e372657e088d5b7db76346cd958b1b642b984
(Ownership transferred to 0x0000000000000000000000000000000000000000 on 17/04/2021, 4:59:30 am)

【问题讨论】:

    标签: python python-3.x selenium beautifulsoup


    【解决方案1】:

    您可以使用regular expressions

    from bs4 import BeautifulSoup
    from selenium import webdriver
    import time
    import re
    
    driver = webdriver.Chrome('chromedriver.exe')
    url = 'https://poocoin.app/rugcheck/0xf09b7b6ba6dab7cccc3ae477a174b164c39f4c66/dev-activity'
    driver.get(url)
    
    time.sleep(5)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    pdata = soup.find_all('div',attrs={"class":"mt-2"})
    lines = [str(x.find('p')) for x in pdata]
    
    address = re.search('/tokens/(0x\w+)"', lines[1]).group(1)
    print(address)
    
    activity = 'Wallet activity for ' + re.search('/address/(0x\w+)"', lines[3]).group(1)
    print(activity)
    
    matches = re.search('"_blank">(0x\w+)</a>( on [^\)]+)\)', lines[3])
    ownership = '(Ownership transferred to ' + matches.group(1) + matches.group(2) + ')'
    print(ownership)
    
    driver.quit()
    

    输出:

    0xf09b7b6ba6dab7cccc3ae477a174b164c39f4c66
    Wallet activity for 0x410e372657e088d5b7db76346cd958b1b642b984
    (Ownership transferred to 0x0000000000000000000000000000000000000000 on 16/04/2021, 21:59:30)
    

    【讨论】:

      【解决方案2】:

      试试:

      pdata = soup.select('div.mt-2 p )
      for x in pdata:
          print (x.text)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-05-19
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-04-12
        相关资源
        最近更新 更多