【问题标题】:UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' when converting series object to unicode in pandas with utf-16UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' when convert series object to unicode in pandas with utf-16
【发布时间】:2014-05-31 18:45:19
【问题描述】:

我有一个 utf-16 csv 文件,我正在尝试将其加载到 Pandas 中。默认情况下,数据以对象数据类型的形式出现。我计划对标题列进行一些建模,因此我想将列 df['caption'] 从对象转换为 unicode 字符串。目前我遇到以下错误 'UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 6: ordinal not in range(128)' when doing df['caption']=df['caption'].astype(unicode).

我试图通过对 df['caption'] 列中的各个值使用编码和解码函数来解决这个问题,但我无法让它工作。

我对 pandas 和 unicode 非常陌生,所以我想知道是否有一些关于我做错了什么的见解。

提前致谢。

特蕾莎

补充信息如下:

回溯如下:

UnicodeEncodeError: Traceback (most recent call last)
<ipython-input-5-aad36f4acf38> in <module>()
    10 print df['caption'].head(10)
    11 
---> 12 df['caption']=df['caption'].astype(unicode)

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/generic.pyc in astype(self, dtype, copy, raise_on_error)
   2016 
   2017         mgr = self._data.astype(
-> 2018             dtype, copy=copy, raise_on_error=raise_on_error)
   2019         return self._constructor(mgr).__finalize__(self)
   2020 

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/internals.pyc in astype(self, *args, **kwargs)
   2414 
   2415     def astype(self, *args, **kwargs):
-> 2416         return self.apply('astype', *args, **kwargs)
   2417 
   2418     def convert(self, *args, **kwargs):

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
   2373 
   2374             else:
-> 2375                 applied = getattr(blk, f)(*args, **kwargs)
   2376 
   2377             if isinstance(applied, list):

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/internals.pyc in astype(self, dtype, copy, raise_on_error, values)
    425     def astype(self, dtype, copy=False, raise_on_error=True, values=None):
    426         return self._astype(dtype, copy=copy, raise_on_error=raise_on_error,
--> 427                             values=values)
    428 
    429     def _astype(self, dtype, copy=False, raise_on_error=True, values=None,

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/internals.pyc in _astype(self, dtype, copy, raise_on_error, values, klass)
    442             # force the copy here
    443             if values is None:
--> 444                 values = com._astype_nansafe(self.values, dtype, copy=True)
    445             newb = make_block(values, self.items, self.ref_items,
    446                               ndim=self.ndim, placement=self._ref_locs,

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/common.pyc in _astype_nansafe(arr, dtype, copy)
  2222         return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  2223     elif issubclass(dtype.type, compat.string_types):
   -> 2224         return lib.astype_str(arr.ravel()).reshape(arr.shape)
  2225 
  2226     if copy:

   /opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.astype_str (pandas/lib.c:12944)()

   /opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.astype_str (pandas/lib.c:12862)()

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 6: ordinal not in range(128)

我的代码如下:

import pandas as pd
import numpy as np

df = pd.read_csv('Chevrolet_4-7-2014_cvid_data.csv',encoding='utf-16',header=0,na_values=['N/A',''],names=['channel','link','title','posted','views','likes','dislikes','description','category','statdate','statviews','timewatched','averagetw','subsdriven','shares','caption'])
print df.head(5)
print df.dtypes


print df['caption'].head(10)

df['caption']=df['caption'].astype(unicode)

数据如下所示:

channel                                        link  \
0  Chevrolet  http://www.youtube.com/watch?v=dCayKZe6WvI   
1  Chevrolet  http://www.youtube.com/watch?v=IRXK35dPXbE   
2  Chevrolet  http://www.youtube.com/watch?v=XXdj4QMw748   
3  Chevrolet  http://www.youtube.com/watch?v=_ger32ROs94   
4  Chevrolet  http://www.youtube.com/watch?v=Chfm7Pou49k   
5  Chevrolet  http://www.youtube.com/watch?v=ySmEJyQ94BI   

                                           title       posted   views  \
0  Chevy Open House Event: From Our House to Your...  Apr  1 2014   73111   
1  Truck Towing Capabilities: 2014 Silverado -- #...  Mar 26 2014   11934   
2  Potholes at the Milford Proving Grounds: Tips ...  Mar 20 2014    8037   
3  Diesel Trucks: Heavy Duty Strengths -- 2015 Si...  Mar 20 2014   12096   
4  Captain America: All in a Day's Work -- 2014 T...  Mar 14 2014   93377   
5  Media Blasting: Camaro Engineering -- 2014 Cam...  Mar 13 2014  109931   

   likes  dislikes                                        description  \
0     43        13  In March over 100000 people visited our Chevy ...   
1    183        56  Farmer Dewayne Kleman and General Motors engin...   
2     58        10  Chevrolet vehicles are carefully designed to w...   
3    210         6  Introducing the all-new 2015 Silverado HD. The...   
4   1095        35  From saving the world to working on math homew...   

       category statdate  statviews timewatched averagetw  subsdriven  \
0  Autos & Vehicles      NaN        NaN         NaN       NaN         NaN   
1  Autos & Vehicles      NaN        NaN         NaN       NaN         NaN   
2  Autos & Vehicles      NaN        NaN         NaN       NaN         NaN   
3  Autos & Vehicles      NaN        NaN         NaN       NaN         NaN   
4  Autos & Vehicles      NaN        NaN         NaN       NaN         NaN   

   shares                                            caption  
0     NaN   The Chevy Spring Open House Sale the perfect ...  
1     NaN   0:03 A Man And His Truck And An Engineer / To...  
2     NaN   0:02 Severe Bump road sign 0:07 Pothole Facil...  
3     NaN   0:03 And there's no stronger Silverado than t...  
4     NaN   0:03 Are you doing anything fun Saturday nigh...  
5     NaN   0:05 Camaro Z/28 logo 0:07 Z/28 Bead Lock 0:0...  

[5 rows x 16 columns]
channel         object
link            object
title           object
posted          object
views           object
likes            int64
dislikes         int64
description     object
category        object
statdate        object
statviews      float64
timewatched     object
averagetw       object
subsdriven     float64
shares         float64
caption         object

dtype: object
0     The Chevy Spring Open House Sale the perfect ...
1     0:03 A Man And His Truck And An Engineer / To...
2     0:02 Severe Bump road sign 0:07 Pothole Facil...
3     0:03 And there's no stronger Silverado than t...
4     0:03 Are you doing anything fun Saturday nigh...
5     0:05 Camaro Z/28 logo 0:07 Z/28 Bead Lock 0:0...

Name: caption, dtype: object

【问题讨论】:

  • 你确定它还不是 utf-16 吗?您在阅读 csv 时指定了它
  • 嗯,事情是做df['caption'].dtype返回的对象作为数据类型......所以我不确定。最后,我想在 nltk 和 sci-kit learn 中运行数据框以进行一些预测建模,所以我想确保我有正确的数据类型
  • object 是用于非整数类型的 numpy 的通用 dtype,因此它很可能并且可能仍然是 utf-16 字符串

标签: python python-2.7 unicode pandas


【解决方案1】:

您可以尝试将dtype={'caption' : str} 添加到您的read_csv() 通话中吗?喜欢:

df = pd.read_csv('Chevrolet_4-7-2014_cvid_data.csv',
     encoding='utf-16',
     header=0,
     na_values=['N/A',''],
     names=[...],
     dtype={'caption' : str})

顺便说一句,熊猫在这里默认使用header=0。并不是说我可以看到您的 CSV,但您使用 names 关键字参数可能是多余的,因为如果它们位于 CSV 的第 0 行,pandas 将自动使用这些列名。但无论如何,让我知道另一件事是否适合你。 :)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-08-14
    • 2014-05-09
    • 2022-01-20
    • 2015-12-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-12-01
    相关资源
    最近更新 更多