【问题标题】:Webscraping with bs4使用 bs4 进行网页抓取
【发布时间】:2016-09-02 18:01:04
【问题描述】:
import requests
from bs4 import BeautifulSoup

url = "http://bet.hkjc.com/football/index.aspx?lang=en"
r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")

div = soup.find("div", {"class": "footballmaincontent"})
tables = div.find_all("table")
my_table = tables[2]

for row in my_table.find_all('tr'):
    cols = row.find_all('td')

    odds_list = []
    if len(cols) >= 10:
        match_no = (cols[0].text.strip())
        teams = (cols[2].text.strip())
        match_time = (cols[4].text.strip())
        home_odds = (cols[7].text.strip())
        away_odds = (cols[8].text.strip())
        draw_odds = (cols[9].text.strip())

        odds_row = (match_no,teams,match_time,home_odds,away_odds,draw_odds)
        odds_list.append(odds_row)

# Write to csv file
import csv
with open("odds_file.csv", "wb") as file:
    writer = csv.writer(file)
    for row in odds_list:
        writer.writerow(row)

我尝试通过将列附加到 for 循环内的“odds_list”来将列导出到 csv 文件。但结果它没有在“odds_file”中写入任何内容。

我知道有问题

odds_row = (match_no,teams,match_time,home_odds,away_odds,draw_odds)

但是如何将我制作的列表附加到 csv 文件中?

【问题讨论】:

  • 通过使用 children[2],您正在选择 DOM 中的第三个表。这就是你想要的,DOM 中第三张表中的数据吗?

标签: python-2.7 web-scraping beautifulsoup


【解决方案1】:

你有my_table,所以使用findfind_allmy_table得到<tr>和更高版本的<td>,然后你可以从<td>得到text


编辑:

import requests
from bs4 import BeautifulSoup

url = "http://bet.hkjc.com/football/index.aspx?lang=en"
r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")

div = soup.find("div", {"class": "footballmaincontent"})
tables = div.find_all("table")
my_table = tables[2]

for row in my_table.find_all('tr'):
    cols = row.find_all('td')
    if len(cols) >= 10:
        print(cols[0].text.strip(),'|',end='')
        print(cols[2].text.strip(),'|',end='')
        print(cols[4].text.strip(),'|',end='')
        print(cols[7].text.strip(),'|',end='')
        print(cols[8].text.strip(),'|',end='')
        print(cols[9].text.strip(),'|',end='')
        print()
        print('-'*40)

结果

Match No. |Teams(Home vs Away) |Expected StopSelling Time |Home/Away/Draw | | |
----------------------------------------
FRI 9 |Romania U21 vs Luxembourg U21 |03/09 01:30 |Accept In Play Betting Only | | |
----------------------------------------
FRI 13 |St. Vincent and Grenadines vs USA |03/09 03:30 |35.00 |13.00 |1.02 |
----------------------------------------
FRI 14 |Honduras vs Canada |03/09 05:06 |1.45 |3.55 |6.50 |
----------------------------------------
FRI 15 |Trinidad and Tobago vs Guatemala |03/09 07:00 |1.67 |3.20 |4.70 |
----------------------------------------

【讨论】:

    猜你喜欢
    • 2019-02-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-11-08
    • 1970-01-01
    • 2016-12-13
    • 1970-01-01
    • 2020-07-22
    相关资源
    最近更新 更多