【问题标题】:Error writing data to CSV due to ascii error in Python由于 Python 中的 ascii 错误,将数据写入 CSV 时出错
【发布时间】:2015-08-13 14:38:16
【问题描述】:
import requests
from bs4 import BeautifulSoup
import csv
from urlparse import urljoin
import urllib2


base_url = 'http://www.baseball-reference.com'
data = requests.get("http://www.baseball-reference.com/teams/BAL/2014-schedule-scores.shtml")
soup = BeautifulSoup(data.content)
outfile = open("./Balpbp.csv", "wb")
writer = csv.writer(outfile)

url = []
for link in soup.find_all('a'):

    if not link.has_attr('href'):
        continue

    if link.get_text() != 'boxscore':
        continue

    url.append(base_url + link['href'])

for list in url:
    response = requests.get(list)
    html = response.content
    soup = BeautifulSoup(html)


    table = soup.find('table', attrs={'id': 'play_by_play'})

    list_of_rows = []
    for row in table.findAll('tr'):
        list_of_cells = []
        for cell in row.findAll('td'):
            text = cell.text.replace(' ', '')
            list_of_cells.append(text)
        list_of_rows.append(list_of_cells)
    writer.writerows(list_of_rows)

u'G.\xa0Holland', u'N.\xa0Cruz'...

这是错误信息:

Traceback (most recent call last):
  File "try.py", line 40, in <module>
    writer.writerows(list_of_rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 57: ordinal not in range(128)

当我将数据写入 csv 时,我最终会得到包含 \x... 的数据,这些内容会阻止数据写入 csv。我该如何更改数据以删除这部分数据或采取一些措施来规避此问题?

【问题讨论】:

    标签: python csv web-scraping non-ascii-characters


    【解决方案1】:

    你不能在python2的csv模块中使用unicode,你需要encode字符串:

    注意

    此版本的 csv 模块不支持 Unicode 输入。此外,目前还有一些关于 ASCII NUL 字符的问题。因此,为了安全起见,所有输入都应该是 UTF-8 或可打印的 ASCII;请参阅示例部分中的示例。

    text = cell.text.replace('&nbsp;', '').encode("utf-8")
    

    编码后的输出:

    Top of the 1st, Red Sox Batting, Tied 0-0, Orioles' Chris Tillman facing 1-2-3
    "
    t1,0-0,0,---,"7,(2-2) CBBFFFX",O,BOS,D. Nava,C. Tillman,2%,52%,Groundout: P-1B (P's Right)
    t1,0-0,1,---,"4,(1-2) BCFX",,BOS,D. Pedroia,C. Tillman,-2%,50%,Single to RF (Line Drive to Short RF)
    t1,0-0,1,1--,"5,(1-2) CFBFT",O,BOS,D. Ortiz,C. Tillman,3%,52%,Strikeout Swinging
    t1,0-0,2,1--,"4,(0-2) C1CFS",O,BOS,M. Napoli,C. Tillman,2%,55%,Strikeout Swinging
    ,,,,,,,,,"0 runs, 1 hit, 0 errors, 1 LOB. Red Sox 0, Orioles 0."
    "Bottom of the 1st, Orioles Batting, Tied 0-0, Red Sox' Jon Lester facing 1-2-3
    "
    b1,0-0,0,---,"4,(1-2) CBFX",O,BAL,N. Markakis,J. Lester,-2%,52%,Groundout: 3B-1B (Weak 3B)
    b1,0-0,1,---,"6,(3-2) BBFFBX",,BAL,J. Hardy,J. Lester,2%,55%,Single to LF (Line Drive)
    b1,0-0,1,1--,"4,(1-2) FBSX",O,BAL,A. Jones,J. Lester,-3%,52%,Popfly: SS (Deep SS)
    b1,0-0,2,1--,"5,(1-2) FFBFS",O,BAL,C. Davis,J. Lester,-2%,50%,Strikeout Swinging
    ....................................
    

    【讨论】:

      猜你喜欢
      • 2021-09-17
      • 1970-01-01
      • 2014-01-21
      • 2023-03-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-11-08
      • 1970-01-01
      相关资源
      最近更新 更多