【问题标题】:How to filter apache access log using awk or sed or cut如何使用 awk 或 sed 或 cut 过滤 apache 访问日志
【发布时间】:2015-05-21 20:38:46
【问题描述】:

这是我的 apache 访问日志文件。我想要 apache 访问日志的 uniq 计数。

"2011-09-07 17:00:00" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/"
"2011-09-07 17:00:17" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:21" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:00" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:00" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:16" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:29" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:22" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:38" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:44" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:33" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:04" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:06" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:14" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http
"2011-09-07 17:00:51" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:33" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:45" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:59" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:02:00" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:02:09" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:00" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:09" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/

上面的文件我给出了一个示例。日志文件不断增长。
预期输出

/abc/index.php/contentapi/discontent/  - 3  
/abc/index.php/data/dataContent/  - 3  
/abc/index.php/Api/ApiContent/ - 5  
/abc/index.php/site/siteContent/ - 6  
/abc/index.php/htmlrequest/htmlContent/ - 5  

【问题讨论】:

  • 你已经尝试了什么失败了?

标签: awk sed grep cut


【解决方案1】:

我认为 apache 日志中可能有一些拼写错误,但是这个怎么样:

$ grep -o 'abc/[^ 0-9]*/' apache.log | sort | uniq -c | sort -r
6 abc/index.php/site/siteContent/
5 abc/index.php/htmlrequest/htmlContent/
5 abc/index.php/Api/ApiContent/
3 abc/index.php/data/dataContent/
2 abc/index.php/contentapi/discontent/
1 abc/index.php/contentapi/

【讨论】:

    【解决方案2】:

    这会提取假定为 URL 的第四个字段

    cat logfile | awk -F' ' '{print $4}' | awk -F'/' '{print $2"/"$3"/"$4"/"$5}' | sort | uniq -c

    【讨论】:

    • 任何时候你看到一个以cat file 开头的解决方案,你就知道发帖者不知道shell。每当您看到包含 awk ... | awk ... 的解决方案时,您就知道发布者不知道 awk。每当您看到一个解决方案多次使用相同的硬编码字符串(例如"/")而不是, 来使用OFS 来分隔输出字段时,您再次知道发布者不知道awk。每当您看到包含 sort | uniq 的解决方案时,您就会再次知道发布者不知道 shell。所以我会警惕这个解决方案,因为它包含许多常见的危险信号。
    • 那么你就会知道有数百种方法可以解决这样一个简单的问题。关键是要在“保持简单愚蠢”和准确之间取得平衡,以便它适用于所有场景。如果用户要求这样的命令保持简单可能是最好的
    【解决方案3】:

    使用 GNU awk 进行 gensub():

    $ awk '{cnt[gensub(/(([/][^/]+){4}[/]).*/,"\\1","",$4)]++} END{for (url in cnt) print url " - " cnt[url]}' file
    /abc/index.php/contentapi/discontent/ - 3
    /abc/index.php/data/dataContent/ - 3
    /abc/index.php/site/siteContent/ - 6
    /abc/index.php/Api/ApiContent/ - 5
    /abc/index.php/htmlrequest/htmlContent/ - 5
    

    【讨论】:

      猜你喜欢
      • 2014-12-10
      • 2018-01-14
      • 1970-01-01
      • 2020-05-15
      • 2012-11-19
      • 2013-12-11
      • 1970-01-01
      • 2020-06-19
      • 2016-03-22
      相关资源
      最近更新 更多