【问题标题】:Select csv files urls from soup object从汤对象中选择 csv 文件 url
【发布时间】:2020-12-04 08:22:34
【问题描述】:

请问,如何从soccer historical data 中选择 csv 文件 url,并将它们保存为名称:“state”+“season”+“csv 文件名称”?我迷失在这个领域...

driver2 = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver2.get("https://www.football-data.co.uk/englandm.php")
pgsource2 = driver2.page_source
soup2 = BeautifulSoup(pgsource2, 'html.parser')

x = soup2.find_all('table')

for a in x.find_all('a', href=True):
        y = a['href']
        print(y)

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    这是我的版本

        soup2 = BeautifulSoup(resp, 'html.parser')
        main_table = soup2.find('a', href=re.compile(r'.csv')).parent
        result = {}
        curent_key = 'none'
        for item in main_table:
            if item.name == 'i':
                curent_key = item.text
                print(curent_key)
                if not curent_key in result:
                    result[curent_key] = []
                else: continue
            if item.name == 'a' and item['href'] and curent_key in result:
                result[curent_key].append({ 'href': item['href'], 'text': item.text })
            
        print(result)
    

    【讨论】:

      【解决方案2】:

      您可以通过以下方式找到所有 .csv 文件并将其保存在本地:

      from urllib.parse import urljoin
      
      import requests
      from bs4 import BeautifulSoup
      
      base_url = "https://www.football-data.co.uk/"
      page = requests.get(urljoin(base_url, "englandm.php")).text
      anchors = BeautifulSoup(page, "html.parser").find_all(
          lambda t: t.name == "a" and ".csv" in t["href"],
      )
      csv_links = [urljoin(base_url, a["href"]) for a in anchors]
      
      name_mapping = {
          "E0.csv": "Premier_League",
          "E1.csv": "Championship",
          "E2.csv": "League_1",
          "E3.csv": "League_2",
          "EC.csv": "Conference",
      }
      
      for csv_link in csv_links:
          *_, date, file_name = csv_link.split("/")
          print(f"Fetching {csv_link}...")
          with open(f"{'_'.join([date, name_mapping[file_name]])}.csv", "wb") as f:
              f.write(requests.get(csv_link).content)
      
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2020-07-03
        • 2021-01-21
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-08-31
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多