使用 Python 将 .arff 文件转换为 .csv答案

【问题标题】：Converting .arff file to .csv using Python使用 Python 将 .arff 文件转换为 .csv
【发布时间】：2019-09-03 07:22:56
【问题描述】：

我有一个文件“LMD.rh.arff”，我正在尝试使用以下代码将其转换为 .csv 文件-

import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import arff


# Read in .arff file-
data = arff.loadarff("LMD.rh.arff")

但是这最后一行代码给了我错误-

----------------------------------- ---------------------------- UnicodeEncodeError Traceback（最近调用最后）在 ----> 1 个数据 = arff.loadarff("LMD.rp.arff")

~/.local/lib/python3.6/site-packages/scipy/io/arff/arffread.py 在加载(f) 第539章 540尝试： --> 541 返回 _loadarff(ofile) 最后542：第543章

~/.local/lib/python3.6/site-packages/scipy/io/arff/arffread.py 在 _loadarff(ofile) 第627章 628 #这里不应该发生错误：否则它是一个错误 --> 629 数据 = np.fromiter(a, descr) 630 返回数据，元第631章

UnicodeEncodeError: 'ascii' 编解码器无法对字符 '\xf3' 进行编码位置 4：序数不在范围内（128）

在[6]中：data = arff.loadarff("LMD.rh.arff")

----------------------------------- ---------------------------- UnicodeEncodeError Traceback（最近调用最后）在 ----> 1 个数据 = arff.loadarff("LMD.rh.arff")

~/.local/lib/python3.6/site-packages/scipy/io/arff/arffread.py 在加载(f) 第539章 540尝试： --> 541 返回 _loadarff(ofile) 最后542：第543章

~/.local/lib/python3.6/site-packages/scipy/io/arff/arffread.py 在 _loadarff(ofile) 第627章 628 #这里不应该发生错误：否则它是一个错误 --> 629 数据 = np.fromiter(a, descr) 630 返回数据，元第631章

UnicodeEncodeError: 'ascii' 编解码器无法对字符 '\xf3' 进行编码位置 4：序数不在范围内（128）

您可以下载文件arff_file

关于出了什么问题有什么想法吗？

谢谢！

【问题讨论】：

标签： python csv arff

【解决方案1】：

试试这个

path_to_directory="./"
files = [arff for arff in os.listdir(path_to_directory) if arff.endswith(".arff")]

def toCsv(content): 
    data = False
    header = ""
    newContent = []
    for line in content:
        if not data:
            if "@attribute" in line:
                attri = line.split()
                columnName = attri[attri.index("@attribute")+1]
                header = header + columnName + ","
            elif "@data" in line:
                data = True
                header = header[:-1]
                header += '\n'
                newContent.append(header)
        else:
            newContent.append(line)
    return newContent

# Main loop for reading and writing files
for zzzz,file in enumerate(files):
    with open(path_to_directory+file , "r") as inFile:
        content = inFile.readlines()
        name,ext = os.path.splitext(inFile.name)
        new = toCsv(content)
        with open(name+".csv", "w") as outFile:
            outFile.writelines(new)

【讨论】：

【解决方案2】：

查看错误跟踪

UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 4: ordinal not in range(128)

您的错误表明您的文件存在一些编码问题。考虑先用正确的编码打开文件，然后加载到 arff loader

import codecs
import arff

file_ = codecs.load('LMD.rh.arff', 'rb', 'utf-8') # or whatever encoding you have 
arff.load(file_) # now this should be fine

参考见here

【讨论】：

当我尝试“codecs.load()”行时，它说：AttributeError: module 'codecs' has no attribute 'load'
我尝试了以下代码- f = codecs.open("LMD.rh.arff", "r", "utf-8") data = arff.loadarff(f) 但是，同样产生错误