使用 bs4 进行网页抓取答案

【问题标题】：Webscraping with bs4使用 bs4 进行网页抓取
【发布时间】：2016-09-02 18:01:04
【问题描述】：

import requests
from bs4 import BeautifulSoup

url = "http://bet.hkjc.com/football/index.aspx?lang=en"
r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")

div = soup.find("div", {"class": "footballmaincontent"})
tables = div.find_all("table")
my_table = tables[2]

for row in my_table.find_all('tr'):
    cols = row.find_all('td')

    odds_list = []
    if len(cols) >= 10:
        match_no = (cols[0].text.strip())
        teams = (cols[2].text.strip())
        match_time = (cols[4].text.strip())
        home_odds = (cols[7].text.strip())
        away_odds = (cols[8].text.strip())
        draw_odds = (cols[9].text.strip())

        odds_row = (match_no,teams,match_time,home_odds,away_odds,draw_odds)
        odds_list.append(odds_row)

# Write to csv file
import csv
with open("odds_file.csv", "wb") as file:
    writer = csv.writer(file)
    for row in odds_list:
        writer.writerow(row)

我尝试通过将列附加到 for 循环内的“odds_list”来将列导出到 csv 文件。但结果它没有在“odds_file”中写入任何内容。

我知道有问题

odds_row = (match_no,teams,match_time,home_odds,away_odds,draw_odds)

但是如何将我制作的列表附加到 csv 文件中？

【问题讨论】：

通过使用 children[2]，您正在选择 DOM 中的第三个表。这就是你想要的，DOM 中第三张表中的数据吗？

标签： python-2.7 web-scraping beautifulsoup

【解决方案1】：

你有my_table，所以使用find和find_all和my_table得到<tr>和更高版本的<td>，然后你可以从<td>得到text。

编辑：

import requests
from bs4 import BeautifulSoup

url = "http://bet.hkjc.com/football/index.aspx?lang=en"
r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")

div = soup.find("div", {"class": "footballmaincontent"})
tables = div.find_all("table")
my_table = tables[2]

for row in my_table.find_all('tr'):
    cols = row.find_all('td')
    if len(cols) >= 10:
        print(cols[0].text.strip(),'|',end='')
        print(cols[2].text.strip(),'|',end='')
        print(cols[4].text.strip(),'|',end='')
        print(cols[7].text.strip(),'|',end='')
        print(cols[8].text.strip(),'|',end='')
        print(cols[9].text.strip(),'|',end='')
        print()
        print('-'*40)

结果

Match No. |Teams(Home vs Away) |Expected StopSelling Time |Home/Away/Draw | | |
----------------------------------------
FRI 9 |Romania U21 vs Luxembourg U21 |03/09 01:30 |Accept In Play Betting Only | | |
----------------------------------------
FRI 13 |St. Vincent and Grenadines vs USA |03/09 03:30 |35.00 |13.00 |1.02 |
----------------------------------------
FRI 14 |Honduras vs Canada |03/09 05:06 |1.45 |3.55 |6.50 |
----------------------------------------
FRI 15 |Trinidad and Tobago vs Guatemala |03/09 07:00 |1.67 |3.20 |4.70 |
----------------------------------------

【讨论】：