【发布时间】:2017-06-27 03:29:48
【问题描述】:
我查看了一些答案,包括 this,但似乎没有一个回答我的问题。
以下是 CSV 中的一些示例行:
_id category
ObjectId(56266da778d34fdc048b470b) [{"group":"Home","id":"53cea0be763f4a6f4a8b459e","name":"Cleaning Services","name_singular":"Cleaning Service"}]
ObjectId(56266e0c78d34f22058b46de) [{"group":"Local","id":"5637a1b178d34f20158b464f","name":"Balloon Dí©cor","name_singular":"Balloon Dí©cor"}]
这是我的代码:
import csv
import sys
from sys import argv
import json
def ReadCSV(csvfile):
with open('newCSVFile.csv','wb') as g:
filewriter = csv.writer(g) #, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
with open(csvfile, 'rb') as f:
reader = csv.reader(f) # ceate reader object
next(reader) # skip first row
for row in reader: #go trhough all the rows
listForExport = [] #initialize list that will have two items: id and list of categories
# ID section
vendorId = str(row[0]) #pull the raw vendor id out of the first column of the csv
vendorId = vendorId[9:33] # slice to remove objectdId lable and parenthases
listForExport.append(vendorId) #add evendor ID to first item in list
# categories section
tempCatList = [] #temporarly list of categories for scond item in listForExport
#this is line 41 where the error stems
categories = json.loads(row[1]) #create's a dict with the categoreis from a given row
for names in categories: # loop through the categorie names using the key 'name'
print names['name']
这是我得到的:
Cleaning Services
Traceback (most recent call last):
File "csvtesting.py", line 57, in <module>
ReadCSV(csvfile)
File "csvtesting.py", line 41, in ReadCSV
categories = json.loads(row[1]) #create's a dict with the categoreis from a given row
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-10: invalid continuation byte
所以代码提取了第一个类别Cleaning Services,但是当我们到达非ascii字符时就失败了。
我该如何处理?我很高兴只删除所有非 ascii 项目。
【问题讨论】:
-
你试过
your_string.encode('unicode_escape').decode('utf-8', 'ignore')吗? -
没有。我会把它放在代码的什么地方?
-
我猜在这种情况下,
your_string就是names['name']。 -
在 csv.reader 中不需要传递 delimeter=' ' 吗?
-
@Coldspeed 错误源于
categories = json.loads(row[1])上方的两行