将多行文本文件拆分为多行 csv 文件答案

【问题标题】：Split a multiple line text file into a multiple line csv file将多行文本文件拆分为多行 csv 文件
【发布时间】：2014-06-09 20:54:12
【问题描述】：

我有一个包含以下格式数据的文本文件；

100157  100157
100157  364207
100157  38848
100157  bradshaw97introduction
100157  bylund99coordinating
100157  dix01metaagent
100157  gray99finding
...
...

我正在尝试使用以下方法将其转换为 scikit 可读数据集：

datafile = open(filename.txt, 'r')
data=[]
for row in datafile:
    data.append(row.strip().split('\t'))

c1 = open(filename.csv, 'w')
arr = str(data)
c.write(arr)
c.close

但是在执行此代码后，数据会在单行中输出，而我打算将数据以 csv 格式整齐地按行和列分隔，就像 Iris 数据集一样。

我可以就如何进行操作获得一些帮助吗？谢谢。

【问题讨论】：

请告诉我们结果应该是什么样子。

标签： python csv numpy split scikit-learn

【解决方案1】：

使用csv module:

import csv

with open('filename.txt', 'r') as f, open('filename.csv', 'w') as fout:
    writer = csv.writer(fout)
    writer.writerows(line.rstrip().split('\t') for line in f)

输出 csv 文件：

100157,100157
100157,364207
100157,38848
100157,bradshaw97introduction
100157,bylund99coordinating
100157,dix01metaagent
100157,gray99finding
...

【讨论】：

太快了！！非常感谢。
@falsetru line.split() 不够吗？
@cdhagmann，如果数据中没有空格，没关系。但是如果有空间，它会在错误的地方分裂。

【解决方案2】：

如果我错了，请纠正我，但我认为 scikit readable dataset 只是空格分隔值，\n 分隔行？

如果是这样，很容易：

假设你有这个文件：

100157  100157
100157  364207
100157  38848
100157  bradshaw97introduction
100157  bylund99coordinating
100157  dix01metaagent
100157  gray99finding

以制表符分隔。

您可以轻松地将其转换为空格分隔的新行分隔值：

with open('/tmp/test.csv', 'r') as fin, open('/tmp/test.out', 'w') as fout:
    data=[row.strip().split('\t') for row in fin]
    st='\n'.join(' '.join(e) for e in data)
    fout.write(st)

print data  
# [['100157', '100157'], ['100157', '364207'], ['100157', '38848'], ['100157', 'bradshaw97introduction'], ['100157', 'bylund99coordinating'], ['100157', 'dix01metaagent'], ['100157', 'gray99finding']]
print st   
100157 100157
100157 364207
100157 38848
100157 bradshaw97introduction
100157 bylund99coordinating
100157 dix01metaagent
100157 gray99finding

【讨论】：