【问题标题】:How to grab data fields from a dynamic url using selenium in python如何在 python 中使用 selenium 从动态 url 中获取数据字段
【发布时间】:2021-10-30 21:06:04
【问题描述】:

我可以从 url 中提取一些数据,但我仍然缺少一些数据。

import requests
from bs4 import BeautifulSoup
import time
from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
url = 'https://poocoin.app/tokens/0xe56842ed550ff2794f010738554db45e60730371'
driver.get(url)

time.sleep(8)
soup = BeautifulSoup(driver.page_source, 'lxml')

data = soup.find('div', class_='overflow-auto unpad-3 ps-3').get_text()
print (data)

电流输出:

Pc v2 | BIN/BNB LP Holdings: 4,694.84 BNB ($2,221,326) | Chart | Holders
Pc v2 | BIN/BUSD LP Holdings: 0.03 BUSD ($0) | Chart | Holders
Pc v2 | BIN/USDT LP Holdings: 0.00 USDT ($0) | Chart | Holders

想要的输出:

Pc v2 | BIN/BNB LP Holdings: 4,697.12 BNB ($2,226,112)
    | Chart     https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c?a=0xe432afB7283A08Be24E9038C30CA6336A7cC8218#tokenAnalytics
    | Holders   https://bscscan.com/token/0xe432afB7283A08Be24E9038C30CA6336A7cC8218#balances
Pc v2 | BIN/BUSD LP Holdings: 0.03 BUSD ($0)
    | Chart     https://bscscan.com/token/0xe9e7cea3dedca5984780bafc599bd69add087d56?a=0x61ca44133a0984EF96E2358947463C41837CaD50#tokenAnalytics
    | Holders   https://bscscan.com/token/0x61ca44133a0984EF96E2358947463C41837CaD50#balances
Pc v2 | BIN/USDT LP Holdings: 0.00 USDT ($0)
    | Chart     https://bscscan.com/token/0x55d398326f99059ff775485246999027b3197955?a=0x9eb614F1c85414328EdAA1508C626993d45B1453#tokenAnalytics
    | Holders   https://bscscan.com/token/0x9eb614F1c85414328EdAA1508C626993d45B1453#balances

【问题讨论】:

    标签: python selenium beautifulsoup


    【解决方案1】:

    在一行输出中,在a 标签上使用find_all 方法并放置文本以获取特定链接

    all_links=[ i['href'] for i in soup.find('div', class_='overflow-auto unpad-3 ps-3').find_all("a",text=['Chart','Holders'])]
    

    输出:

    ['https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c?a=0xe432afB7283A08Be24E9038C30CA6336A7cC8218#tokenAnalytics',
     'https://bscscan.com/token/0xe432afB7283A08Be24E9038C30CA6336A7cC8218#balances',
     'https://bscscan.com/token/0xe9e7cea3dedca5984780bafc599bd69add087d56?a=0x61ca44133a0984EF96E2358947463C41837CaD50#tokenAnalytics',
     'https://bscscan.com/token/0x61ca44133a0984EF96E2358947463C41837CaD50#balances',
     'https://bscscan.com/token/0x55d398326f99059ff775485246999027b3197955?a=0x9eb614F1c85414328EdAA1508C626993d45B1453#tokenAnalytics',
     'https://bscscan.com/token/0x9eb614F1c85414328EdAA1508C626993d45B1453#balances']
    

    根据您的要求:

    data=soup.find('div', class_='overflow-auto unpad-3 ps-3').find_all("div",class_="text-xs my-3")
    for i in data:
        print(i.find("a",attrs={"target":"_blank"}).get_text(),end="")
        print(" ".join(i.find("a").find_next_siblings(text=True)[:2]),end="")
        print(i.find("span").get_text())
        links=[i.get_text() +" "+ i['href'] for i in i.find_all("a",text=['Chart','Holders'])]
        print(*links,sep="\n")
        
    

    输出:

    Pc v2 | BIN/BNB LP Holdings: 4,716.76 BNB ($2,234,449)
    Chart https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c?a=0xe432afB7283A08Be24E9038C30CA6336A7cC8218#tokenAnalytics
    Holders https://bscscan.com/token/0xe432afB7283A08Be24E9038C30CA6336A7cC8218#balances
    Pc v2 | BIN/BUSD LP Holdings: 0.03 BUSD ($0)
    Chart https://bscscan.com/token/0xe9e7cea3dedca5984780bafc599bd69add087d56?a=0x61ca44133a0984EF96E2358947463C41837CaD50#tokenAnalytics
    Holders https://bscscan.com/token/0x61ca44133a0984EF96E2358947463C41837CaD50#balances
    Pc v2 | BIN/USDT LP Holdings: 0.00 USDT ($0)
    Chart https://bscscan.com/token/0x55d398326f99059ff775485246999027b3197955?a=0x9eb614F1c85414328EdAA1508C626993d45B1453#tokenAnalytics
    Holders https://bscscan.com/token/0x9eb614F1c85414328EdAA1508C626993d45B1453#balances
    

    【讨论】:

    • 谢谢。如何拆分输出以提供我需要的格式。
    • 我已经更新了我的答案,请看一下
    【解决方案2】:

    试试这个:

    soup = BeautifulSoup(driver.page_source,'html5lib')
    
    rows = soup.find_all('div', class_='text-xs my-3')
    for row in rows:
        data = row.get_text()
        chart = "Chart: {}".format(row.find('a',text=['Chart']).attrs['href'])
        holder = "Holders: {}".format(row.find('a',text=['Holders']).attrs['href'])
        print(data)
        print(chart)
        print(holder)
    

    输出:

    Pc v2 | BIN/BNB LP Holdings:4,708.86 BNB ($2,239,013) | Chart | Holders
    Chart: https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c?a=0xe432afB7283A08Be24E9038C30CA6336A7cC8218#tokenAnalytics
    Holders: https://bscscan.com/token/0xe432afB7283A08Be24E9038C30CA6336A7cC8218#balances
    Pc v2 | BIN/BUSD LP Holdings:0.03 BUSD ($0) | Chart | Holders
    Chart: https://bscscan.com/token/0xe9e7cea3dedca5984780bafc599bd69add087d56?a=0x61ca44133a0984EF96E2358947463C41837CaD50#tokenAnalytics
    Holders: https://bscscan.com/token/0x61ca44133a0984EF96E2358947463C41837CaD50#balances
    Pc v2 | BIN/USDT LP Holdings:0.00 USDT ($0) | Chart | Holders
    Chart: https://bscscan.com/token/0x55d398326f99059ff775485246999027b3197955?a=0x9eb614F1c85414328EdAA1508C626993d45B1453#tokenAnalytics
    Holders: https://bscscan.com/token/0x9eb614F1c85414328EdAA1508C626993d45B1453#balances
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-12-16
      • 2021-11-24
      • 2012-01-04
      • 1970-01-01
      • 2021-05-19
      • 2020-11-30
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多