【发布时间】:2020-03-19 06:37:48
【问题描述】:
我在尝试使用 pandas 读取 csv 文件时遇到一些问题,因为只有第一行可以正确解析日期(下一行来自 NaN 或 NaT。我尝试直接打开文件查看它的样子:
f = open('20191122.csv', "r", encoding='ascii')
f.read(300)
前 300 个字符表明行以 \n\x00 结尾:
'20191122 21:29,1,59,-999,42,-999.9,-999.9,37,34,1,0.0,0.4,0.4,0.4,0,0,0,0,0,10.1,9.6,0.0,0,33.7,36.0,75.4,29.6,14.0,59.5,32.7,6.7,6.8,0.2,-\n\x0020191122 21:30,1,59,-999,42,-999.9,-999.9,37,34,1,0.0,0.4,0.4,0.4,0,0,0,0,0,10.0,9.8,0.0,0,33.4,35.9,74.9,29.0,13.9,59.6,32.7,6.6,6.6,0.2,-\n\x0020191122 21:30,1,5'
逐行拉取时,第一行就OK了:
data = f.readlines()
data[0]
'20191122 21:29,1,59,-999,42,-999.9,-999.9,37,34,1,0.0,0.4,0.4,0.4,0,0,0,0,0,10.1,9.6,0.0,0,33.7,36.0,75.4,29.6,14.0,59.5,32.7,6.7,6.8,0.2,-\n'
但其余行以 \x00 开头,因此无法解析日期:
data[1]
'\x0020191122 21:30,1,59,-999,42,-999.9,-999.9,37,34,1,0.0,0.4,0.4,0.4,0,0,0,0,0,10.0,9.8,0.0,0,33.4,35.9,74.9,29.0,13.9,59.6,32.7,6.6,6.6,0.2,-\n'
所以问题似乎与编码有关?我已经在 csv 文件上尝试了 chardet package,它给出了相同的结果:ascii with confidence 1.0 但我似乎找不到如何处理 \x00 的答案...
【问题讨论】:
标签: python pandas csv encoding ascii