【问题标题】:Grabbing all data fields from a div in python beautifulsoup从python beautifulsoup中的div中获取所有数据字段
【发布时间】:2021-09-20 08:30:29
【问题描述】:

直到前几天,下面的 sn-p 才能正常工作。有什么方法可以轻松提取此 div class="row mb-4" 中的所有数据。我的想法是,如果对页面进行额外的更改,脚本仍然不会受到影响。

import requests
from bs4 import BeautifulSoup

header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0'
}

url = "https://bscscan.com/token/"
token = "0x4ce1a5cb12151423ea479cfd0c52ec5021d108d8"
tokenurl = str(url)+str(token)

contractpage = requests.get(tokenurl,header)
ca = BeautifulSoup(contractpage.content, 'html.parser')
tokenholders = ca.find(id='ContentPlaceHolder1_tr_tokenHolders').get_text()
tokenholdersa = (((tokenholders.strip().strip("Holders:")).strip()).strip(" a ")).strip()
tholders = ((((tokenholders.strip()).strip("Holders:")).strip()).strip(" a ")).strip()
tokenaname = ca.find('span', class_='text-secondary small').get_text().strip()

def get_transfer_count(str:token)->str:
    with requests.Session() as s:
        s.headers = {'User-Agent':'Mozilla/5.0'}
        r = s.get(f'https://bscscan.com/token/{token}') 
        try:   
            sid = re.search(r"var sid = '(.*?)'", r.text).group(1)
            r = s.get(f'https://bscscan.com/token/generic-tokentxns2?m=normal&contractAddress={token}&a=&sid={sid}&p=1')
            return re.search(r"var totaltxns = '(.*?)'", r.text).group(1)
        except:
            pass
transcount = get_transfer_count(token)

print ("Token: ",     tokenaname)
print ("Holders: ",   tholders)
print ("Transfers: ", transcount)

上一个输出:

Token:     Binemon
Holders:   27,099
Transfers: 439,636

想要改进的输出:

Token:  Binemon
PRICE:  $0.01 @ 0.000037 BNB (-22.41%)
Fully Diluted Market Cap: $14,011,783.50

Total Supply:   975,000,000 BIN
Holders:        27,099 addresses
Transfers:      439,636
Contract:       0xe56842ed550ff2794f010738554db45e60730371
Decimals:       18
Official Site:  https://binemon.io/
Social Profiles:
    https://twitter.com/binemonnft
    https://t.me/binemonchat
    https://docs.binemon.io/
    https://coinmarketcap.com/currencies/binemon/
    https://www.coingecko.com/en/coins/binemon/

【问题讨论】:

  • 我的示例代码为 403。 :(

标签: python python-3.x web-scraping beautifulsoup


【解决方案1】:

试试:

import requests
from bs4 import BeautifulSoup

header = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0",
}
tokenurl = (
    "https://bscscan.com/token/0x7083609fce4d1d8dc0c979aab8c869ea2c873402"
)

contractpage = requests.get(tokenurl, headers=header)
ca = BeautifulSoup(contractpage.content, "html.parser")

name = ca.h1.span.get_text(strip=True)
price = ca.select_one(".card-body .d-block").get_text(strip=True)
cap = ca.select_one("#pricebutton").get_text(strip=True)

print("Token:", name)
print("PRICE:", price)
print("Fully Diluted Market Cap:", cap)
print()

for c in ca.select(".row .col-md-8"):
    pt = c.find_previous(class_="col-md-4").get_text(strip=True)
    t = c.get_text(strip=True, separator=" ").split("(")[0]
    if pt == "Social Profiles:":
        links = [a["href"].strip() for a in c.select("a")]
        print(pt, *links, sep="\n\t")
    else:
        print(pt, t)

打印:

Token: Binance-Peg Polkadot Token
PRICE: $30.35@ 0.079643 BNB(-10.39%)
Fully Diluted Market Cap: $485,657,455.49

Total Supply: 15,999,999.991309 DOT 
Holders: 80,065 addresses
Transfers: -
Contract: 0x7083609fce4d1d8dc0c979aab8c869ea2c873402
Decimals: 18
Official Site: https://polkadot.network/
Social Profiles:
        https://polkadot.network/blog
        https://reddit.com/r/dot
        https://twitter.com/polkadotnetwork
        https://github.com/w3f
        https://polkadot.network/PolkaDotPaper.pdf
        https://coinmarketcap.com/currencies/polkadot-new/
        https://www.coingecko.com/en/coins/polkadot/

【讨论】:

  • 谢谢。如何制作我的 get_transfer_count(str:token)->str: 部分?我也想获取 Transfers 数据。
  • @rbutrnz 我建议在这里打开一个新问题。我会试着看看它!
  • 我发了另一个帖子。希望你能看看它。 sn-p 之前的工作
  • 我错过了代码中的 import re。它现在可以工作了。
猜你喜欢
  • 1970-01-01
  • 2017-01-27
  • 1970-01-01
  • 1970-01-01
  • 2023-02-26
  • 2023-02-10
  • 2015-12-30
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多