【发布时间】:2023-03-08 18:05:01
【问题描述】:
我有一个从 csv 文件中提取行值的场景。
(CSV) 测试1:
Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
(CSV)测试2:
Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
这是我的代码:
df = pd.read_csv('test1.csv',skipfooter=1)
df2 = pd.read_csv('test2.csv',skipfooter=1)
combined = pd.merge(df[['Host',' Time Up']],df2[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
在这里,我试图获取“server_hit_rate”的值为 99% 并且属于第 3 行数据。 但是使用上面的代码,我只能获取第一行中的数据。即
Host Time Up Time OK
0 server1.test.com:1717 100.000% 100.000%
1 server2.test.com:1717 100.000% 100.000%
所需的输出应该是:
Host Time Up Time OK
0 server1.test.com:1717 100.000% 99.000%
1 server2.test.com:1717 100.000% 99.000%
任何实现以下目标的建议都会有所帮助。
编辑1:
import pandas as pd
import pandas
import os, shutil, glob
import sys
import datetime
import time
def t1():
import pandas as pd
import pandas
today=datetime.datetime.utcnow().strftime("%a %b %d %H:%M:%S %Z %Y")
print "date :", today
df = pd.read_csv('t1.csv',skipfooter=1, engine='python')
df2 = pd.read_csv('t2.csv',skipfooter=1, engine='python')
temp = df2.ffill()[df2['Service']=='server_hit_rate']
combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
combined.to_csv('test.csv',index=False)
t1()
O/P:
Wed Nov 15 10:07:01 2017
Empty DataFrame
Columns: [Host, % Time Up, % Time OK]
Index: []
【问题讨论】:
-
test2中的数据似乎都是一个长 csv 字符串(请注意,每一行都以,开头。应该是这样吗? -
是的,数据一模一样
-
那么那些不是新的行。如果它只是 1 个包含
,的长字符串(或者即使它是一个列表),那么你只会得到 1 行。 -
好的那么有什么办法可以做到这一点。
-
很多方法,假设您实际上有 csv 数据行。您的
test1是有效的,即 4 行 csv 数据(按行,每行以\n'). But intest2, notice how you really only have 2 rows. The first starts withHost` 结尾,第二个以Average结尾。其他所有内容都是一个巨大的 csv 行。
标签: python python-2.7 python-3.x pandas csv