【问题标题】:Encoding issue when reading csv - lines end with \n\x00读取 csv 时的编码问题 - 行以 \n\x00 结尾
【发布时间】:2020-03-19 06:37:48
【问题描述】:

我在尝试使用 pandas 读取 csv 文件时遇到一些问题,因为只有第一行可以正确解析日期(下一行来自 NaNNaT。我尝试直接打开文件查看它的样子:

f = open('20191122.csv', "r", encoding='ascii')
f.read(300)

前 300 个字符表明行以 \n\x00 结尾:

'20191122 21:29,1,59,-999,42,-999.9,-999.9,37,34,1,0.0,0.4,0.4,0.4,0,0,0,0,0,10.1,9.6,0.0,0,33.7,36.0,75.4,29.6,14.0,59.5,32.7,6.7,6.8,0.2,-\n\x0020191122 21:30,1,59,-999,42,-999.9,-999.9,37,34,1,0.0,0.4,0.4,0.4,0,0,0,0,0,10.0,9.8,0.0,0,33.4,35.9,74.9,29.0,13.9,59.6,32.7,6.6,6.6,0.2,-\n\x0020191122 21:30,1,5'

逐行拉取时,第一行就OK了:

data = f.readlines()
data[0]
'20191122 21:29,1,59,-999,42,-999.9,-999.9,37,34,1,0.0,0.4,0.4,0.4,0,0,0,0,0,10.1,9.6,0.0,0,33.7,36.0,75.4,29.6,14.0,59.5,32.7,6.7,6.8,0.2,-\n'

但其余行以 \x00 开头,因此无法解析日期:

data[1]
'\x0020191122 21:30,1,59,-999,42,-999.9,-999.9,37,34,1,0.0,0.4,0.4,0.4,0,0,0,0,0,10.0,9.8,0.0,0,33.4,35.9,74.9,29.0,13.9,59.6,32.7,6.6,6.6,0.2,-\n'

所以问题似乎与编码有关?我已经在 csv 文件上尝试了 chardet package,它给出了相同的结果:ascii with confidence 1.0 但我似乎找不到如何处理 \x00 的答案...

【问题讨论】:

    标签: python pandas csv encoding ascii


    【解决方案1】:

    The line ending in UTF-16 is "\n\x00". f.readlines() puts \n as a line ending.

    So try:

    data = open(...).read()
    data.encode().decode("utf-16")
    

    【讨论】:

    • Did not work, got following error measure: ´AttributeError: 'str' object has no attribute 'decode'´
    • Sorry, that didn't work either.. Just renders a lot of rubbish: '〲㤱ㄱ㈲㈠㨱㤲ㄬ㔬ⰹ㤭㤹㐬ⰲ㤭㤹㤮\u2d2c㤹⸹ⰹ㜳㌬ⰴⰱ⸰ⰰ⸰ⰴ⸰ⰴ⸰ⰴⰰⰰⰰⰰⰰ〱ㄮ㤬㘮〮〬〬㌬⸳ⰷ㘳〮㜬⸵ⰴ㤲㘮ㄬ⸴ⰰ㤵㔮㌬⸲ⰷ⸶ⰷ⸶ⰸ⸰ⰲਭ㈀\u3130ㄹ㈱′ㄲ㌺ⰰⰱ㤵\u2d2c㤹ⰹ㈴\u2d2c㤹⸹ⰹ㤭㤹㤮㌬ⰷ㐳ㄬ〮〬〬㐮〬㐮〬㐮〬〬〬〬〬ㄬ⸰ⰰ⸹ⰸ⸰ⰰⰰ㌳㐮㌬⸵ⰹ㐷㤮㈬⸹ⰰ㌱㤮㔬⸹ⰶ㈳㜮㘬㘮㘬㘮〬㈮\u2d2c\n〲㤱ㄱ㈲㈠㨱〳ㄬ㔬ⰹ㤭㤹㐬ⰲ㤭㤹㤮\u2d2c㤹⸹ⰹ㜳㌬ⰴⰱ⸰ⰰ⸰ⰴ⸰ⰴ⸰ⰴⰰⰰⰰⰰⰰ⸹ⰸ〱〮〮〬〬㌬⸳ⰰ㔳㤮㜬⸴ⰴ㠲㐮ㄬ⸳ⰵ㤵㐮㌬⸲ⰷ⸶ⰵ⸶ⰶ⸰ⰲਭ㈀\u3130ㄹ㈱′ㄲ㌺ⰱⰱ㤵\u2d2c㤹ⰹ㈴\u2d2c㤹⸹ⰹ㤭㤹㤮㌬ⰷ㐳ㄬ〮〬〬㐮〬㐮〬㐮〬〬〬〬〬㤬㜮ㄬ⸰ⰱ⸰ⰰⰰ㈳㜮㌬⸵ⰸ㐷〮㈬⸷ⰸ
    【解决方案2】:

    你可以尝试替换它!

    if '\x00' in open(tmp_file_name).read():
        print("you have null bytes in your input file")
        fi = open(tmp_file_name, 'r')
        tmp_file_csv_data = fi.read()
        fi.close()
        fo = open(tmp_file_name, 'w')
        fo.write(tmp_file_csv_data.replace('\x00',''))
        enter code herefo.close()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-07-17
      • 1970-01-01
      • 1970-01-01
      • 2015-12-22
      相关资源
      最近更新 更多