【问题标题】:BeautifulSoup Python3 append multiple links output to single listBeautifulSoup Python3 将多个链接输出附加到单个列表
【发布时间】:2020-10-03 19:24:00
【问题描述】:
    import requests
    from bs4 import BeautifulSoup
    import re

    links = ["https://bitcointalk.org/index.php?board=159.0",
             "https://bitcointalk.org/index.php?board=159.40",
             "https://bitcointalk.org/index.php?board=159.80"]


    def get_span():
        for url in links:
            page = requests.get(url) 
            soup = BeautifulSoup(page.text, "html.parser") 
            t1 = str(soup.findAll("span", id=re.compile('^msg_')))
            print(t1)
            t2 = [x for x in re.findall(r'\d+\.\d+', t1)]  
            t2.sort(key=float, reverse=True)  

            t3 = "https://bitcointalk.org/index.php?topic"
            for hn in t2:
                if len(hn) >= 9:
                    hn = '{}={}'.format(t3, hn)
                    print(hn)


    get_span()

你好! 我的代码迭代 link 中的项目,然后找到跨度 id=msg_,然后找到id=msg_中的所有数字,将它们排序 降序。问题是它迭代第一个项目并给出输出 它,然后是第二个项目,依此类推,所以输出包含 3 个列表。所以 分别对项目进行排序.. 我想从links 获得所有 3 个项目的输出 排序在一个列表中。

【问题讨论】:

标签: python python-3.x beautifulsoup request


【解决方案1】:

您可以使用list.extend 将项目添加到列表中,然后在返回之前对最终列表进行排序。

例如:

import re
import requests
from bs4 import BeautifulSoup


links = ["https://bitcointalk.org/index.php?board=159.0",
         "https://bitcointalk.org/index.php?board=159.40",
         "https://bitcointalk.org/index.php?board=159.80"]

def get_span(links):
    rv = []
    r = re.compile(r'\d{7,}\.\d+')
    for url in links:
        soup = BeautifulSoup(requests.get(url).content, "html.parser")
        rv.extend(a['href'] for a in soup.select('span[id^="msg_"] > a') if r.search(a['href']))
    return sorted(rv, key=lambda k: float(r.search(k).group(0)), reverse=True)


all_links = get_span(links)

# print links on screen:
for link in all_links:
    print(link)

打印:

https://bitcointalk.org/index.php?topic=5255494.0
https://bitcointalk.org/index.php?topic=5255416.0
https://bitcointalk.org/index.php?topic=5255389.0
https://bitcointalk.org/index.php?topic=5255376.0
https://bitcointalk.org/index.php?topic=5255316.0
https://bitcointalk.org/index.php?topic=5254720.0
https://bitcointalk.org/index.php?topic=5254480.0
https://bitcointalk.org/index.php?topic=5254448.0
https://bitcointalk.org/index.php?topic=5254287.0
https://bitcointalk.org/index.php?topic=5252504.0
https://bitcointalk.org/index.php?topic=5251621.0
https://bitcointalk.org/index.php?topic=5250998.0
https://bitcointalk.org/index.php?topic=5250388.0
https://bitcointalk.org/index.php?topic=5250185.0
https://bitcointalk.org/index.php?topic=5248406.0
https://bitcointalk.org/index.php?topic=5247112.0

... and so on.

编辑:如果你想显示链接文本 n

ext to url, you can use this example:

import re
import requests
from bs4 import BeautifulSoup


links = ["https://bitcointalk.org/index.php?board=159.0",
         "https://bitcointalk.org/index.php?board=159.40",
         "https://bitcointalk.org/index.php?board=159.80"]

def get_span(links):
    rv = []
    r = re.compile(r'\d{7,}\.\d+')
    for url in links:
        soup = BeautifulSoup(requests.get(url).content, "html.parser")
        rv.extend((a['href'], a.text) for a in soup.select('span[id^="msg_"] > a') if r.search(a['href']))
    return sorted(rv, key=lambda k: float(r.search(k[0]).group(0)), reverse=True)


all_links = get_span(links)

# print links on screen:
for link, text in all_links:
    print('{} {}'.format(link, text))

打印:

https://bitcointalk.org/index.php?topic=5255494.0 NUL Token - A new hyper-deflationary experiment! Airdrop!
https://bitcointalk.org/index.php?topic=5255416.0 KEEP NETWORK - A privacy layer for Ethereum
https://bitcointalk.org/index.php?topic=5255389.0 [ANN] ICO - OBLICHAIN | Blockchain technology at the service of creative genius
https://bitcointalk.org/index.php?topic=5255376.0 UniChain - The 4th Generation Blockchain Made For The Smart Society 5.0
https://bitcointalk.org/index.php?topic=5255316.0 INFINITE RICKS ! First Multiverse Cryptocurrency ! PoS 307%
https://bitcointalk.org/index.php?topic=5254720.0 [GMC] GameCredits - Unofficial & Unmoderated for Censored Posts.
https://bitcointalk.org/index.php?topic=5254480.0 [ANN] [BTCV] Bitcoin VaultA higher standard in security
https://bitcointalk.org/index.php?topic=5254448.0 [ANN] Silvering (SLVG) token - New Silver Asset Backed Cryptocurrency

... and so on.

【讨论】:

  • 谢谢!它按我的意愿工作。也许您知道如何对帖子旁边的文字进行排序?例如。 bitcointalk.org/index.php?topic=3920469.0 [ANN][ICO]HoweyCoins: ' '唯一的 BitcoinTalk 认可的 ICO - 保证利润
猜你喜欢
  • 2017-02-21
  • 2020-07-23
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-09-15
  • 1970-01-01
  • 2021-12-25
相关资源
最近更新 更多