【发布时间】:2015-10-04 12:25:44
【问题描述】:
我需要在 HDFStore 中存储大量消息,其中一些包含表情符号或特殊字符,例如 éěščřžýáí。一切似乎都正常,直到我尝试加载它,然后它崩溃并出现以下错误。这是以错误结尾的示例代码
import pandas as pd
df = pd.DataFrame(columns=["A"])
toAppend = {"A": "é"}
df = df.append(toAppend, ignore_index = True)
df['A'] = df['A'].astype(str)
store = pd.HDFStore(r'thiswillcrash.h5')
store.put('df', df, format='table', encoding="utf-8")
d = store["df"]
print(d)
store.close()
这是错误
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
C:\Users\Filip\Anaconda3\lib\site-packages\pandas\io\pytables.py in _unconvert_string_array(data, nan_rep, encoding)
4407 dtype = "S{0}".format(itemsize)
-> 4408 data = data.astype(dtype, copy=False).astype(object, copy=False)
4409 except (Exception) as e:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
During handling of the above exception, another exception occurred:
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-8-f2a5372d5498> in <module>()
8 store = pd.HDFStore(r'iwillcrash18.h5')
9 store.put('df', df, format='table', encoding="utf-8")
---> 10 d = store["df"]
11 print(d)
12
C:\Users\Filip\Anaconda3\lib\site-packages\pandas\io\pytables.py in __getitem__(self, key)
416
417 def __getitem__(self, key):
--> 418 return self.get(key)
419
420 def __setitem__(self, key, value):
C:\Users\Filip\Anaconda3\lib\site-packages\pandas\io\pytables.py in get(self, key)
626 if group is None:
627 raise KeyError('No object named %s in the file' % key)
--> 628 return self._read_group(group)
629
630 def select(self, key, where=None, start=None, stop=None, columns=None,
C:\Users\Filip\Anaconda3\lib\site-packages\pandas\io\pytables.py in _read_group(self, group, **kwargs)
1274 s = self._create_storer(group)
1275 s.infer_axes()
-> 1276 return s.read(**kwargs)
1277
1278
C:\Users\Filip\Anaconda3\lib\site-packages\pandas\io\pytables.py in read(self, where, columns, **kwargs)
4006 def read(self, where=None, columns=None, **kwargs):
4007
-> 4008 if not self.read_axes(where=where, **kwargs):
4009 return None
4010
C:\Users\Filip\Anaconda3\lib\site-packages\pandas\io\pytables.py in read_axes(self, where, **kwargs)
3218 for a in self.axes:
3219 a.set_info(self.info)
-> 3220 a.convert(values, nan_rep=self.nan_rep, encoding=self.encoding)
3221
3222 return True
C:\Users\Filip\Anaconda3\lib\site-packages\pandas\io\pytables.py in convert(self, values, nan_rep, encoding)
2071 if _ensure_decoded(self.kind) == u('string'):
2072 self.data = _unconvert_string_array(
-> 2073 self.data, nan_rep=nan_rep, encoding=encoding)
2074
2075 return self
C:\Users\Filip\Anaconda3\lib\site-packages\pandas\io\pytables.py in _unconvert_string_array(data, nan_rep, encoding)
4409 except (Exception) as e:
4410 f = np.vectorize(lambda x: x.decode(encoding), otypes=[np.object])
-> 4411 data = f(data)
4412
4413 if nan_rep is None:
C:\Users\Filip\Anaconda3\lib\site-packages\numpy\lib\function_base.py in __call__(self, *args, **kwargs)
1698 vargs.extend([kwargs[_n] for _n in names])
1699
-> 1700 return self._vectorize_call(func=func, args=vargs)
1701
1702 def _get_ufunc_and_otypes(self, func, args):
C:\Users\Filip\Anaconda3\lib\site-packages\numpy\lib\function_base.py in _vectorize_call(self, func, args)
1767 for _a in args]
1768
-> 1769 outputs = ufunc(*inputs)
1770
1771 if ufunc.nout == 1:
C:\Users\Filip\Anaconda3\lib\site-packages\pandas\io\pytables.py in <lambda>(x)
4408 data = data.astype(dtype, copy=False).astype(object, copy=False)
4409 except (Exception) as e:
-> 4410 f = np.vectorize(lambda x: x.decode(encoding), otypes=[np.object])
4411 data = f(data)
4412
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 0: unexpected end of data
我有 Pandas 0.16.2 和 PyTables 3.2.2
【问题讨论】:
-
请提供一个最小的例子。另外,请确保您阅读并理解此错误的含义。这实际上是一个常见错误,您应该可以通过一些基础研究轻松找到它的含义。
-
我将示例放在第一个代码块中。我已经研究了很长时间,我相信我做的一切都是正确的。
-
其实我也这么认为。请原谅我,我说得太早了。有一些东西是不必要的(
df['A'] = df['A'].astype(str)和encoding="utf-8"),但这并没有改变任何东西。如果您向字符串添加更多数据,例如toAppend = {"A": "aée"},则数据将被存储和检索而不会出错,但结果已损坏。我将其称为 Pandas 或 PyTables 中的错误(此处为 0.14.1-2 和 3.1.1-3)。
标签: python pandas unicode pytables hdfstore