【发布时间】:2015-08-13 14:38:16
【问题描述】:
import requests
from bs4 import BeautifulSoup
import csv
from urlparse import urljoin
import urllib2
base_url = 'http://www.baseball-reference.com'
data = requests.get("http://www.baseball-reference.com/teams/BAL/2014-schedule-scores.shtml")
soup = BeautifulSoup(data.content)
outfile = open("./Balpbp.csv", "wb")
writer = csv.writer(outfile)
url = []
for link in soup.find_all('a'):
if not link.has_attr('href'):
continue
if link.get_text() != 'boxscore':
continue
url.append(base_url + link['href'])
for list in url:
response = requests.get(list)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', attrs={'id': 'play_by_play'})
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(' ', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
writer.writerows(list_of_rows)
u'G.\xa0Holland', u'N.\xa0Cruz'...
这是错误信息:
Traceback (most recent call last):
File "try.py", line 40, in <module>
writer.writerows(list_of_rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 57: ordinal not in range(128)
当我将数据写入 csv 时,我最终会得到包含 \x... 的数据,这些内容会阻止数据写入 csv。我该如何更改数据以删除这部分数据或采取一些措施来规避此问题?
【问题讨论】:
标签: python csv web-scraping non-ascii-characters