【问题标题】:read and parse through csv file [python3.6]通过csv文件读取和解析[python3.6]
【发布时间】:2017-10-19 12:07:45
【问题描述】:

我对 python 很陌生。我正在尝试将 2 个 csv 文件合并为一个,选择特定的行和列。

csv1:

Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000

csv2:

Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000

下面是结合这两个文件的代码:

import csv
itertools as IT

filenames = ['csv1.csv', 'csv2.csv']
handles = [open(filename, 'rb') for filename in filenames]    
readers = [csv.reader(f, delimiter=',') for f in handles]

with  open('combined.csv', 'wb') as h:
    writer = csv.writer(h, delimiter=',', lineterminator='\n', )
    for rows in IT.izip_longest(*readers, fillvalue=['']*3):
        combined_row = []
        for row in rows:
            row = row[:3] # select the columns you want
            if len(row) == 3:
                combined_row.extend(row)
            else:
                combined.extend(['']*3)
        writer.writerow(combined_row)

for f in handles:
    f.close()

this 组合并输出 this:

Host, Time Up, Time Down,Host,Service, Time OK
server1.test.com:1717,100.000% (100.000%),0.000%      (0.000%),server1.test.com:1717,application_availability_check,100.000% (100.000%)
server2.test.com:1717,100.000% (100.000%),0.000%   (0.000%),,server_hit_rate,100.000% (100.000%)
Average,100.000% (100.000%),0.000% (0.000%),,max_hit_rate,100.000% (100.000%)
,,,,application_log_check,100.000% (100.000%)
,,,,application_sessions_check,100.000% (100.000%)
,,,server2.test.com:1717,application_availability_check,100.000%   (100.000%)
,,,,server_hit_rate,100.000% (100.000%)
,,,,max_hit_rate,100.000% (100.000%)
,,,,application_log_check,100.000% (100.000%)
,,,,application_sessions_check,100.000% (100.000%)
,,,Average,100.000% (100.000%),0.000% (0.000%)

但在这里我只想提取以下内容- 来自 csv1 和 csv2 :

Host, Time Up, Time Down,Service, Time OK
server1.test.com:1717,100.000% (100.000%),0.000%   (0.000%),application_availability_check,100.000% (100.000%)
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),application_availability_check,100.000% (100.000%)

有什么方法可以实现吗?

【问题讨论】:

  • 您使用什么标准来确定要提取哪些行(+ 标题)?
  • 我尝试修复脚本中的缩进,但无法 100% 确定。请检查它并使用 4 个空格作为填充而不是制表符。特别重要的是writer.writerow(combined_row)。顺便问一下,上一行应该是combined_row.extend(['']*3) 而不是combined.extend(['']*3)
  • @Adirio 在此处粘贴代码时似乎拼写错误。
  • 这就是我的想法,检查并编辑您的问题
  • @PeterMularien 标准是否必须在索引中,如果您看到我提到的 microdoft office excel 值出现在 1 个单元格(主机、超时、超时、服务、时间正常)中,所以我想要提取那些列中存在的任何内容

标签: python python-3.x pandas csv


【解决方案1】:
import pandas as pd

df = pd.read_csv('csv1.csv',skipfooter=1)
df2 = pd.read_csv('csv2.csv',skipfooter=1)


combined = pd.merge(df[['Host','Service','Time OK']],df2[['Host','Time Up','Time Down']], on='Host')

combined['Time OK'] = combined['Time OK'].apply(lambda x: x.split('(')[0])
combined['Time Up'] = combined['Time Up'].apply(lambda x: x.split('(')[0])
combined['Time Down'] = combined['Time Down'].apply(lambda x: x.split('(')[0])


combined.to_csv('combined.csv',index=False)

你应该可以用 Pandas 轻松解决这个问题,你有这个选项吗?

输出:

,Host, Time Up, Time Down,Service, Time OK
0,server1.test.com:1717,100.000% (100.000%),0.000%  (0.000%),application_availability_check,100.000% (100.000%)
1,server2.test.com:1717,100.000% (100.000%),0.000%  (0.000%),application_availability_check,100.000% (100.000%)

【讨论】:

  • 它给出了一个错误说明: raise KeyError('%s not in index' % objarr[mask]) KeyError: "['Time OK'] not in index" - 但我有以下在索引本身中。
  • 我已经解决了,因为在我的 csv 中,'Time OK' 之前有一个空格我无法获得所需的输出,因为我有 0,1,2.. 作为第一列,我不想要那我可以删除那个coloumn。另外,如果我想在第 3 列和第 4 列中剥离支架部分(100%),是否可以使用 pandas 模块?
  • 我已经编辑了解决方案并提供了我看到的输出
  • 看看我刚才所做的更改,是否得到了预期的输出?
  • 完美运行,只是想知道这一行 "combined['Time OK'] = combined['Time OK'].apply(lambda x: x.split('(')[0 ])" 是在提取第一行和 application_availability 检查列数据吗?如果我不想提取第二列数据,我应该将其提及为“combined['Time OK'] = combined['Time OK'].apply(lambda x: x.split('(')[1])?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-03-21
  • 2022-12-09
  • 2015-05-08
相关资源
最近更新 更多