如何获取oddsportal页面中的所有URL？答案

【问题标题】：How to get all URLs within a page fom oddsportal?如何获取oddsportal页面中的所有URL？
【发布时间】：2021-07-01 08:47:58
【问题描述】：

我有一个代码可以从oddsportal.com 主页上抓取所有网址。我想要父 URL 中所有页面的后续链接例如 https://www.oddsportal.com/soccer/africa/africa-cup-of-nations/results/ 还有更多页面，即https://www.oddsportal.com/soccer/africa/africa-cup-of-nations/results/、https://www.oddsportal.com/soccer/africa/africa-cup-of-nations-2019/results/ 等。我怎样才能得到它？

我现有的代码：

import requests
import bs4 as bs
import pandas as pd
url = 'https://www.oddsportal.com/results/#soccer'
headers = {
    'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'}
resp = requests.get(url, headers=headers)
soup = bs.BeautifulSoup(resp.text, 'html.parser')
base_url = 'https://www.oddsportal.com'
a = soup.findAll('a', attrs={'foo': 'f'})

# This set will have all the URLs of the main page
s = set()
for i in a:
    s.add(base_url + i['href'])
s = list(s)
# This will filter for all soccer URLs
s = [x for x in s if '/soccer/' in x]
s = pd.DataFrame(s)
print(s)

我对网络抓取非常陌生，因此提出了这个问题。

【问题讨论】：

@Qharr 我该怎么做？
为什么不像在主页上那样直接进入你在s 中获得的网址？

标签： python web-scraping beautifulsoup

【解决方案1】：

您可以根据类属性找到main_div标签，并使用find_all方法通过循环获取标签，您可以提取它的href

from bs4 import BeautifulSoup
import requests

headers = {
    'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'}
source = requests.get("https://www.oddsportal.com/soccer/africa/africa-cup-of-nations/results/",headers=headers)

soup = BeautifulSoup(source.text, 'html.parser')
main_div=soup.find("div",class_="main-menu2 main-menu-gray")
a_tag=main_div.find_all("a")
for i in a_tag:
    print(i['href'])

输出：

/soccer/africa/africa-cup-of-nations/results/
/soccer/africa/africa-cup-of-nations-2019/results/
/soccer/africa/africa-cup-of-nations-2017/results/
/soccer/africa/africa-cup-of-nations-2015/results/
/soccer/africa/africa-cup-of-nations-2013/results/
/soccer/africa/africa-cup-of-nations-2012/results/
/soccer/africa/africa-cup-of-nations-2010/results/
/soccer/africa/africa-cup-of-nations-2008/results/

【讨论】：

OP 需要此代码来输入每个 URL 以获取正确的子 URL？
我没有得到关于子 URL 的问题，我从上面的结果中提取了你想要的 url！