【发布时间】:2017-10-19 12:07:45
【问题描述】:
我对 python 很陌生。我正在尝试将 2 个 csv 文件合并为一个,选择特定的行和列。
csv1:
Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
csv2:
Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
下面是结合这两个文件的代码:
import csv
itertools as IT
filenames = ['csv1.csv', 'csv2.csv']
handles = [open(filename, 'rb') for filename in filenames]
readers = [csv.reader(f, delimiter=',') for f in handles]
with open('combined.csv', 'wb') as h:
writer = csv.writer(h, delimiter=',', lineterminator='\n', )
for rows in IT.izip_longest(*readers, fillvalue=['']*3):
combined_row = []
for row in rows:
row = row[:3] # select the columns you want
if len(row) == 3:
combined_row.extend(row)
else:
combined.extend(['']*3)
writer.writerow(combined_row)
for f in handles:
f.close()
this 组合并输出 this:
Host, Time Up, Time Down,Host,Service, Time OK
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),server1.test.com:1717,application_availability_check,100.000% (100.000%)
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),,server_hit_rate,100.000% (100.000%)
Average,100.000% (100.000%),0.000% (0.000%),,max_hit_rate,100.000% (100.000%)
,,,,application_log_check,100.000% (100.000%)
,,,,application_sessions_check,100.000% (100.000%)
,,,server2.test.com:1717,application_availability_check,100.000% (100.000%)
,,,,server_hit_rate,100.000% (100.000%)
,,,,max_hit_rate,100.000% (100.000%)
,,,,application_log_check,100.000% (100.000%)
,,,,application_sessions_check,100.000% (100.000%)
,,,Average,100.000% (100.000%),0.000% (0.000%)
但在这里我只想提取以下内容- 来自 csv1 和 csv2 :
Host, Time Up, Time Down,Service, Time OK
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),application_availability_check,100.000% (100.000%)
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),application_availability_check,100.000% (100.000%)
有什么方法可以实现吗?
【问题讨论】:
-
您使用什么标准来确定要提取哪些行(+ 标题)?
-
我尝试修复脚本中的缩进,但无法 100% 确定。请检查它并使用 4 个空格作为填充而不是制表符。特别重要的是
writer.writerow(combined_row)。顺便问一下,上一行应该是combined_row.extend(['']*3)而不是combined.extend(['']*3)? -
@Adirio 在此处粘贴代码时似乎拼写错误。
-
这就是我的想法,检查并编辑您的问题
-
@PeterMularien 标准是否必须在索引中,如果您看到我提到的 microdoft office excel 值出现在 1 个单元格(主机、超时、超时、服务、时间正常)中,所以我想要提取那些列中存在的任何内容
标签: python python-3.x pandas csv