【问题标题】:python 3 not recognizing vertical bar characterpython 3无法识别竖线字符
【发布时间】:2017-11-08 13:04:28
【问题描述】:

我有以下代码,但 python 3 没有将垂直管道识别为 unicode 字符。

    m_cols = ['movie_id', 'title', 'release_date', 
        'video_release_date', 'imdb_url']

    movies = pd.read_csv(
        'http://files.grouplens.org/datasets/movielens/ml-100k/u.item', 
         sep='|', names=m_cols, usecols=range(5))

    movies.head()

我收到以下错误

    UnicodeDecodeError                        Traceback (most recent call 
    last)
    pandas\_libs\parsers.pyx in 
    pandas._libs.parsers.TextReader._convert_tokens 
    (pandas\_libs\parsers.c:14858)()

    pandas\_libs\parsers.pyx in 
    pandas._libs.parsers.TextReader._convert_with_dtype 
    (pandas\_libs\parsers.c:17119)()

    pandas\_libs\parsers.pyx in 
    pandas._libs.parsers.TextReader._string_convert 
    (pandas\_libs\parsers.c:17347)()

    pandas\_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8 
    (pandas\_libs\parsers.c:23041)()

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: 
    invalid continuation byte

    During handling of the above exception, another exception occurred:

    UnicodeDecodeError                        Traceback (most recent call 
    last)
    <ipython-input-15-72a8222212c1> in <module>()
    4 movies = pd.read_csv(
    5     'http://files.grouplens.org/datasets/movielens/ml-100k/u.item',
    ----> 6     sep='|', names=m_cols, usecols=range(5))
    7 
    8 movies.head()

这可能是什么原因,我该如何解决?

【问题讨论】:

标签: python pandas


【解决方案1】:

在python3中,使用encoding="latin-1":

In [9]: movies = pd.read_csv(
        'http://files.grouplens.org/datasets/movielens/ml-100k/u.item', 
         sep='|', names=m_cols, usecols=range(5),  header=None, encoding="latin-1")

In [10]: movies.head()
Out[10]: 
   movie_id              title release_date  video_release_date  \
0         1   Toy Story (1995)  01-Jan-1995                 NaN   
1         2   GoldenEye (1995)  01-Jan-1995                 NaN   
2         3  Four Rooms (1995)  01-Jan-1995                 NaN   
3         4  Get Shorty (1995)  01-Jan-1995                 NaN   
4         5     Copycat (1995)  01-Jan-1995                 NaN   

                                            imdb_url  
0  http://us.imdb.com/M/title-exact?Toy%20Story%2...  
1  http://us.imdb.com/M/title-exact?GoldenEye%20(...  
2  http://us.imdb.com/M/title-exact?Four%20Rooms%...  
3  http://us.imdb.com/M/title-exact?Get%20Shorty%...  
4  http://us.imdb.com/M/title-exact?Copycat%20(1995)  

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-12-18
    • 2018-02-10
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多