【问题标题】:column operation in csv [python]csv中的列操作[python]
【发布时间】:2023-03-08 18:05:01
【问题描述】:

我有一个从 csv 文件中提取行值的场景。

(CSV) 测试1:

    Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000

(CSV)测试2:

Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000

这是我的代码:

df = pd.read_csv('test1.csv',skipfooter=1)
df2 = pd.read_csv('test2.csv',skipfooter=1)
combined = pd.merge(df[['Host',' Time Up']],df2[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])

在这里,我试图获取“server_hit_rate”的值为 99% 并且属于第 3 行数据。 但是使用上面的代码,我只能获取第一行中的数据。即

                    Host    Time Up    Time OK
0  server1.test.com:1717  100.000%   100.000% 
1  server2.test.com:1717  100.000%   100.000%

所需的输出应该是:

                    Host    Time Up    Time OK
0  server1.test.com:1717  100.000%    99.000% 
1  server2.test.com:1717  100.000%    99.000% 

任何实现以下目标的建议都会有所帮助。

编辑1:

import pandas as pd
import pandas
import os, shutil, glob
import sys
import datetime
import time
def t1():
    import pandas as pd
    import pandas
    today=datetime.datetime.utcnow().strftime("%a %b %d %H:%M:%S %Z %Y")
    print "date :", today
    df = pd.read_csv('t1.csv',skipfooter=1, engine='python')
    df2 = pd.read_csv('t2.csv',skipfooter=1, engine='python')
    temp = df2.ffill()[df2['Service']=='server_hit_rate']
    combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host')
    combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
    combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
    combined.to_csv('test.csv',index=False)
t1()


O/P:

Wed Nov 15 10:07:01  2017
Empty DataFrame
Columns: [Host, % Time Up, % Time OK]
Index: []

【问题讨论】:

  • test2 中的数据似乎都是一个长 csv 字符串(请注意,每一行都以 , 开头。应该是这样吗?
  • 是的,数据一模一样
  • 那么那些不是新的行。如果它只是 1 个包含 , 的长字符串(或者即使它是一个列表),那么你只会得到 1 行。
  • 好的那么有什么办法可以做到这一点。
  • 很多方法,假设您实际上有 csv 数据行。您的test1 是有效的,即 4 行 csv 数据(按行,每行以 \n'). But in test2, notice how you really only have 2 rows. The first starts with Host` 结尾,第二个以 Average 结尾。其他所有内容都是一个巨大的 csv 行。

标签: python python-2.7 python-3.x pandas csv


【解决方案1】:

如果你选择基于Service包含server_hit_rate的数据通过前向填充Host然后合并数据将相当简单,即

temp = df2.ffill()[df2['Service']=='server_hit_rate']

#                 Host          Service             Time OK      ...
#1  server1.test.com:1717  server_hit_rate  99.000% (100.000%)   ...
#6  server2.test.com:1717  server_hit_rate  99.000% (100.000%)   ...

combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])

数据帧combined的输出:

打印(组合) 主机时间正常 0 server1.test.com:1717 100.000% 99.000% 1 server2.test.com:1717 100.000% 99.000%

也不要在列名之前使用空格,而是使用

去除空格
df.columns = df.columns.str.strip()

【讨论】:

  • 感谢您的回复我检查了上面的代码。它不工作。我在这里错过了什么吗?添加了有问题的整个脚本。
  • 什么不工作?如果你在第一行,你应该做对了
  • 它不会首先生成文件(test.csv)。同时它也不会出错。
  • 现在可以添加combined.head() 的输出吗?我认为问题是如何获得输出?所以下次一定要提到你想将输出导出到 csv。
  • It doesn't generates the file first of all - 检查您保存文件的目录一次,当我在我的机器上尝试时正在生成 csv 文件。
【解决方案2】:

csv 库中的 DictReader 工具对这类事情很方便 - 它将列标题转换为字典键,然后您可以像查询任何其他 dict 一样查询每一行。

from csv import DictReader

with open('test2.csv', newline='') as csvfile:
    srcdat = DictReader(csvfile)
    csvdict = [line for line in srcdat]

for row in csvdict:
    if row['Host']:
        current_host = row['Host']
    q = row[' Time OK']
    q = q.split('.')[0]
    if int(q) <100:
        print(f'Host failure for: {current_host}')
        print('Time OK: ', row[' Time OK'])

输出不是您想要的格式,但应该为您提供基础。

【讨论】:

  • 这究竟会做什么?我正在阅读两个文件。合并文件后你在做什么?
  • 对不起 - 我对你的第二个文件而不是第一个文件运行了它,我个人会将它变成一个函数甚至一个类,并依次在每个文件上运行它,然后合并答案而不是整个文件。输出相当不优雅,但得到了你想要的答案:Host failure for: server1.test.com:1717 Time OK: 99.000% (100.000%) Host failure for: server2.test.com:1717 Time OK: 99.000% (100.000%) Host failure for: Average Time OK: 0.000% (0.000%)
  • 这不是我关于所需输出的问题 :-) 感谢您的帮助
【解决方案3】:

我认为这是获得您想要的结果的更好的代码。请注意,我没有保留“%”,因为您已经表明您希望稍后选择较大的列。这样我们转换为数字并且只使用我们需要的列,我们也从一开始就摆脱了列名中烦人的空格。通过设置索引,我们可以让 Pandas 在不调用 merge 的情况下排列条目。

def parse_percentage(perc_string):
    "Parse the percentage strings of the form 99.00% (99.00%)"
    return float(perc_string.split('%')[0])

t1 = pd.read_csv('t1.csv', 
                 skipfooter=1, 
                 engine='python',
                 sep=' *, *',  # This gets rid of the spaces
                 index_col='Host', 
                 usecols=['Host', 'Time Up'],
                 converters={'Time Up': parse_percentage})

t2 = pd.read_csv('t2.csv',
                  skipfooter=1, 
                  engine='python',
                  sep=' *, *',
                  usecols=['Host', 'Service', 'Time OK'],
                  converters={'Time OK': parse_percentage}).fillna(method='ffill').set_index('Host')

combined = pandas.concat([t1, t2[t2.Service == 'server_hit_rate']['Time OK']], axis=1)
combined.to_csv('test.csv)

【讨论】:

    【解决方案4】:

    我用过 Python3.6。认为这应该可以满足您的需求。

    import pandas as pd
    
    df1 = pd.read_csv('t1.csv', skipfooter=1)
    df1.columns = [c.strip() for c in df1.columns]
    df2 = pd.read_csv('t2.csv', skipfooter=1)
    df2.columns = [c.strip() for c in df2.columns]
    df2 = df2.ffill()
    combined = pd.merge(df1[['Host', 'Time Up']], df2[['Host', 'Service', 'Time OK']], on='Host')
    combined['Time Up'] = combined['Time Up'].apply(lambda x : x.split('(')[0])
    combined['Time OK'] = combined['Time OK'].apply(lambda x : x.split('(')[0])
    print(combined[combined.Service == 'server_hit_rate'])
    

    【讨论】:

      【解决方案5】:

      回答您的挑战是我这一天的一次愉快的咖啡休息时间。请参阅下面的代码。它适用于 CSV1 和 CSV2 文件,因为我为您的搜索创建了 server-name 和 search-key 变量。对于在需要的地方实施的学习曲线“# + 评论”。没有额外的进口或任何需要。只是简单的pythonic写作。

      #!/usr/bin/env python
      # -*- coding: utf-8 -*-
      
      # lists: csv1 and csv2 mimick reading from file.
      
      csv1 =  ["Host, Time Up, Time Down, Time Unreachable, Time Undetermined",
               "server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               "server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               "Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000"]
      
      csv2 =  ["Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined",
               "server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               ",server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               ",max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               ",application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               ",application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               "server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               ",server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               ",max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               ",application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               ",application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
               "Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000"]
      
      # assuming your provided data comes from a static file on hdd and can be read by using readline().
      
      total_servers        = 2
      count_server         = 0
      current_server_name  = ''
      result_dict          = {}
      
      # added implementable server-number; just in case you got multiple servers as your example shows.
      server_name = "server%s.test.com:"
      search_key = ",server_hit_rate"
      
      # the while-loop ploughs/iters through the file for a reason: > someone may have changed the order of servernames randomly.
      
      while count_server < total_servers:
          for line in csv2:
          #    print line  # -> to check output on screen
      
              current_server_name = server_name % str(count_server + 1) # Some folks..start counting at "1"...
      
              if line.startswith((current_server_name)):
                  print current_server_name
      
              if not line.startswith((search_key)):
                  continue
              else:
      #            print current_server_name
                  print 'got your line of interest : "%s"' % line  # -> to check output on screen
                  items = line.split(',')
                  value = items[2]
                  result_dict[current_server_name] = value
      
                  count_server +=1
      
      print result_dict
      

      享受吧!

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-10-05
        • 1970-01-01
        相关资源
        最近更新 更多