【问题标题】:How to search for predefined strings and returned the whole line if a match is found如果找到匹配项,如何搜索预定义的字符串并返回整行
【发布时间】:2021-09-12 04:33:59
【问题描述】:

sn-p 可以部分工作,因为它可以产生一些结果。我需要帮助才能使其完全正常工作。我正在搜索 url 中的字符串,如果找到部分匹配,则将返回整行。

from bs4 import BeautifulSoup as bs
import requests

addrlist = ['0xe56842ed550ff2794f010738554db45e60730371',
           '0xe1fd7b4c9debac3c490d8a553c455da4979482e4',
           '0x88c20beda907dbc60c56b71b102a133c1b29b053']

queries = ["Website", "Telegram", "https://www.", "Twitter", "https://t.me"]
url = "https://bscscan.com/address/"


for i in addrlist:
      url = str(url) + str(i)

      r = requests.get(url)
      soup = bs(r.text,'lxml')

      pre = soup.select_one('pre.js-sourcecopyarea.editor')
      ss = (list(pre.stripped_strings)[0]).split('*')
      for s in ss:
             for query in queries:
                  if query in s:
                      print(s)
           

电流输出:

Website: https://binemon.io
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft

AttributeError: 'NoneType' object has no attribute 'stripped_strings'

想要的输出:

Website: https://binemon.io
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft

// Telegram : https://t.me/stackdogebsc
// Website : https://www.stack-doge.com

*Website: www.shibuttinu.com
*Telegram: https://t.me/Shibuttinu

【问题讨论】:

    标签: python python-3.x beautifulsoup


    【解决方案1】:

    问题是url 变量。您将每个addrlist 连接到上一个网址:

    # 1st iteration:
    https://bscscan.com/address/0xe56842ed550ff2794f010738554db45e60730371
    
    # 2nd iteration:
    https://bscscan.com/address/0xe56842ed550ff2794f010738554db45e607303710xe1fd7b4c9debac3c490d8a553c455da4979482e4
    
    # 3rd iteration:
    https://bscscan.com/address/0xe56842ed550ff2794f010738554db45e607303710xe1fd7b4c9debac3c490d8a553c455da4979482e40x88c20beda907dbc60c56b71b102a133c1b29b053
    

    像这样更改您的代码:

    # url = "https://bscscan.com/address/"
    baseurl = "https://bscscan.com/address/"
    
    # url = str(url) + str(i)
    url = str(baseurl) + str(i)
    

    更新

    使用正则表达式提取信息。

    完整代码:

    from bs4 import BeautifulSoup as bs
    import requests
    import re
    
    addrlist = ['0xe56842ed550ff2794f010738554db45e60730371',
                '0xe1fd7b4c9debac3c490d8a553c455da4979482e4',
                '0x88c20beda907dbc60c56b71b102a133c1b29b053']
    
    baseurl = "https://bscscan.com/address/"
    pattern = r'(Website|Telegram|Twitter)\s*:\s*([^\s]+)'
    
    for i in addrlist:
          url = str(baseurl) + str(i)
    
          r = requests.get(url)
          soup = bs(r.text,'lxml')
    
          pre = soup.select_one('pre.js-sourcecopyarea.editor')
    
          print(url)
          for match in re.findall(pattern, str(pre)):
              print(f"{match[0]}: {match[1]}")
          print()
    

    输出:

    https://bscscan.com/address/0xe56842ed550ff2794f010738554db45e60730371
    Website: https://binemon.io
    Telegram: https://t.me/binemonchat
    Twitter: https://twitter.com/binemonnft
    
    https://bscscan.com/address/0xe1fd7b4c9debac3c490d8a553c455da4979482e4
    Telegram: https://t.me/stackdogebsc
    Website: https://www.stack-doge.com
    
    https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053
    Website: www.shibuttinu.com
    Telegram: https://t.me/Shibuttinu
    

    【讨论】:

    • 谢谢。我按照您的建议进行了修复。我的问题仍然是如何返回匹配的行。
    猜你喜欢
    • 1970-01-01
    • 2023-03-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-04-29
    相关资源
    最近更新 更多