如果格式这么简单,这里还有一个想法 - 使用 CSV 解析器读取文件,使用冒号作为分隔符。示例:
import csv
import itertools
from pprint import pprint as print
file = 'log.txt'
with open(file) as fp:
reader = csv.reader(fp, delimiter=':')
# filter out delimiter lines
rows = [r for r in reader if len(r) == 2]
# group pairs by first element to a dict of lists
grouped = {k: [x[1] for x in v] for k, v
in itertools.groupby(sorted(rows), key=lambda x: x[0])}
print(grouped)
会给你:
{'time_connect': [' 0.460643 ', ' 0.460643 ', ' 0.463243 '],
'time_namelookup': [' 0.121558 ', ' 0.121665 ', ' 0.121668 '],
'time_pretransfer': [' 0.460355 ', ' 0.460755 ', ' 0.460755 '],
'time_redirect': [' 0.000000 ', ' 0.000000 ', ' 0.000000 '],
'time_starttransfer': [' 0.811697 ', ' 0.813697 ', ' 0.911697 '],
'time_total': [' 0.811413 ', ' 0.811813 ', ' 0.811853 ']}
如果您需要进一步处理,请在字典理解中进行,例如解析数字:
grouped = {k: [float(x[1].strip()) for x in v] for k, v
in itertools.groupby(sorted(rows), key=lambda x: x[0])}
输出:
{'time_connect': [0.460643, 0.460643, 0.463243],
'time_namelookup': [0.121558, 0.121665, 0.121668],
'time_pretransfer': [0.460355, 0.460755, 0.460755],
'time_redirect': [0.0, 0.0, 0.0],
'time_starttransfer': [0.811697, 0.813697, 0.911697],
'time_total': [0.811413, 0.811813, 0.811853]}
pandas
如果你身边有pandas,你可以用它来读取CSV格式的日志,这样可以省去解析和分组数据的麻烦。示例:
import pandas as pd
df = pd.read_csv('log.txt', delimiter=':', header=None, names=['Name', 'Num']).dropna().reset_index(drop=True)
print(df)
将输出解析后的数据并准备使用:
Name Num
0 time_namelookup 0.121668
1 time_connect 0.460643
2 time_pretransfer 0.460755
3 time_redirect 0.000000
4 time_starttransfer 0.811697
5 time_total 0.811813
6 time_namelookup 0.121665
7 time_connect 0.460643
8 time_pretransfer 0.460355
9 time_redirect 0.000000
10 time_starttransfer 0.813697
11 time_total 0.811853
12 time_namelookup 0.121558
13 time_connect 0.463243
14 time_pretransfer 0.460755
15 time_redirect 0.000000
16 time_starttransfer 0.911697
17 time_total 0.811413
现在对数据做任何你想做的事情,例如重塑数据框以获得更结构化的视图:
df['chunk'] = df.index // df.Name.unique().size
print(df.pivot(values='Num', columns='Name', index='chunk'))
# Output:
Name time_connect time_namelookup time_pretransfer time_redirect time_starttransfer time_total
chunk
0 0.460643 0.121668 0.460755 0.0 0.811697 0.811813
1 0.460643 0.121665 0.460355 0.0 0.813697 0.811853
2 0.463243 0.121558 0.460755 0.0 0.911697 0.811413
计算选定时间的统计数据:
print(df[df.Name == 'time_total'].describe())
# Output:
Num
count 3.000000
mean 0.811693
std 0.000243
min 0.811413
25% 0.811613
50% 0.811813
75% 0.811833
max 0.811853
等等