【发布时间】:2017-01-02 01:26:41
【问题描述】:
我有一个很大的日志文件。
去掉每一行的时间戳后,我按照cat logfile | sort -u > logfile排序,这样日志就干净整洁了
failed to correct PL.ASBF..HHZ.2011.348 because of divided by zero
failed to correct PL.ASBF..HHZ.2011.349 because of divided by zero
failed to correct PL.ASBF..HHZ.2011.350 because of divided by zero
.
. (lines not shown here)
.
failed to correct PL.ASBF..HHZ.2015.364 because of divided by zero
failed to correct PL.ASBF..HHZ.2015.365 because of divided by zero
.
.
. (lines not shown here)
.
.
failed to correct PL.HSPB..HHZ.2011.128 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.129 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.130 because of Illegal format
.
. (lines not shown here)
.
failed to correct PL.HSPB..HHZ.2014.364 because of Illegal format
failed to correct PL.HSPB..HHZ.2014.365 because of Illegal format
我可以通过
获取记录的项目(例如上面示例中的PL.HSPB)
grep -oE " [0-9A-Z]*\.[0-9A-Z]*" logfile | sort -u
但是,我也想知道日期信息并使其更清晰,我想删除中间行。例如,
failed to correct PL.HSPB..HHZ.2011.128 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.129 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.130 because of Illegal format
.
. (lines not shown here)
.
failed to correct PL.HSPB..HHZ.2014.364 because of Illegal format
failed to correct PL.HSPB..HHZ.2014.365 because of Illegal format
移除后变成
failed to correct PL.HSPB..HHZ.2011.128 because of Illegal format
failed to correct PL.HSPB..HHZ.2014.365 because of Illegal format
即,对于一个项目,只保留第一行和最后一行(数字是年份和儒略日)。
有没有什么shell命令可以轻松搞定?
【问题讨论】:
-
@shellter 日志文件中的日期,如
2014.365,为'year.jday'。无需从月份和日期计算朱利安日期。 -
Doah,我错过了 Y.Jday 在您的原始数据中。祝你好运。
-
只需将另一个
grep添加到产生当前输出的管道中?grep "^failed to connect" logfile | grep -oE " [0-9A-Z]*\.[0-9A-Z]*" | sort -u?祝你好运。
标签: shell logging text text-processing