【问题标题】:Grep only last line with latest datetimes in a text file [closed]Grep 仅在文本文件中包含最新日期时间的最后一行 [关闭]
【发布时间】:2021-10-09 08:47:13
【问题描述】:

我在 Linux 操作系统 (redhat) 中有一个日志文件,它插入数据库的事件。该文件如下所示:

2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z

我只想获取每个用户 (x,y,z) 的最新日期时间行。所以它应该如下所示:

  2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
  2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
  2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z

【问题讨论】:

  • 请添加您的尝试
  • grep 无法做到这一点。您可以尝试使用 sortuniq 构建一些东西,但实际上这更像是脚本语言的工作 - pythonperlawk 或类似

标签: python linux bash awk grep


【解决方案1】:

我们可以使用 来获取在最新列上具有唯一值的行。
print unique lines based on field


为确保这些是最新的(数据时间),我假设如下

  • 文件总是从旧到新排序

因此,如果我们;

  • 反转文件(从new -> old开始)
  • 获取唯一用户行
  • 再次反转它(从old -> new开始)

将获取每个用户的最后一次失败尝试:

tac log.txt | awk -F" " '!_[$9]++' | tac

我的本​​地机器上的示例:

$
$ cat log.txt
2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
$
$ tac log.txt | awk -F" " '!_[$9]++' | tac
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
$

【讨论】:

  • 我有一个类似的解决方案,因此被赞成。您也可以使用sort -k2r $file | awk '!array[$NF]++' | sort -k9 来实现这一点
【解决方案2】:

见下文

from collections import defaultdict
from datetime import datetime

data_str = '''2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z'''
holder = defaultdict(list)
for entry in data_str.split('\n'):
    fields = entry.split(' ')
    holder[fields[-1]].append(datetime.strptime(fields[0] + ' ' + fields[1], '%Y-%m-%d %H:%M:%S.%f'))
for user, date_time_lst in holder.items():
    print(f'{user} --> {max(date_time_lst)}')

输出

x --> 2021-08-04 09:36:05.223000
y --> 2021-08-04 09:37:50.350000
z --> 2021-08-04 09:39:01.372000

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-12-08
    • 1970-01-01
    • 2014-03-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多