【问题标题】:Parse Nginx Ingress Access Log in FluentD Using Multi Format Parser (Regex)使用多格式解析器 (Regex) 在 FluentD 中解析 Nginx 入口访问日志
【发布时间】:2020-08-25 12:08:32
【问题描述】:

我在 K8S 集群中有一个 Nginx Ingress Controller,它具有以下日志格式(我从容器中的 /etc/nginx/nginx.conf 获取):

log_format upstreaminfo '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id';

我的目标是解析 Nginx 日志并将其推送到 CW。请注意,Nginx 日志文件包含 Nginx 应用程序日志(例如信息和警告日志)以及访问日志。我的理解是我必须使用 multi-formatter-parser 插件。所以我将 FluentD 配置如下(参见@nginx 过滤器的expression):

    <source>
      @type tail
      @id in_tail_container_logs
      @label @containers
      path /var/log/containers/*.log
      exclude_path ["/var/log/containers/cloudwatch-agent*", "/var/log/containers/fluentd*", "/var/log/containers/nginx*"]
      pos_file /var/log/fluentd-containers.log.pos
      tag *
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <source>
      @type tail
      @id in_tail_nginx_container_logs
      @label @nginx
      path /var/log/containers/nginx*.log
      pos_file /var/log/fluentd-nginx.log.pos
      tag *
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <source>
      @type tail
      @id in_tail_cwagent_logs
      @label @cwagentlogs
      path /var/log/containers/cloudwatch-agent*
      pos_file /var/log/cloudwatch-agent.log.pos
      tag *
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <label @containers>
      <filter **>
        @type parser
        key_name log
        format json
        reserve_data true
      </filter>

      <filter **>
        @type kubernetes_metadata
        @id filter_kube_metadata
      </filter>

      <filter **>
        @type record_transformer
        @id filter_containers_stream_transformer
        <record>
          stream_name ${tag_parts[3]}
        </record>
      </filter>

      <filter **>
        @type concat
        key log
        multiline_start_regexp /^\S/
        separator ""
        flush_interval 5
        timeout_label @NORMAL
      </filter>

      <match **>
        @type relabel
        @label @NORMAL
      </match>
    </label>

    <label @nginx>
      <filter **>
        @type kubernetes_metadata
        @id filter_nginx_kube_metadata
      </filter>

      <filter **>
        @type record_transformer
        @id filter_nginx_containers_stream_transformer
        <record>
          stream_name ${tag_parts[3]}
        </record>
      </filter>

      <filter **>
        @type parser
        key_name log

        <parse>
          @type multi_format

          <pattern>
            format regexp
            expression /^(?<host>[^ ]*) (?<domain>[^ ]*) \[(?<x_forwarded_for>[^\]]*)\] (?<server_port>[^ ]*) - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+[^\"])(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")? (?<request_length>[^ ]*) (?<request_time>[^ ]*) (?:\[(?<proxy_upstream_name>[^\]]*)\] )?(?:\[(?<proxy_alternative_upstream_name>[^\]]*)\] )?(?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<request_id>[^ ]*)\n$/
          </pattern>
        </parse>
      </filter>


      <match **>
        @type relabel
        @label @NORMAL
      </match>
    </label>

    <label @cwagentlogs>
      <filter **>
        @type kubernetes_metadata
        @id filter_kube_metadata_cwagent
      </filter>

      <filter **>
        @type record_transformer
        @id filter_cwagent_stream_transformer
        <record>
          stream_name ${tag_parts[3]}
        </record>
      </filter>

      <filter **>
        @type concat
        key log
        multiline_start_regexp /^\d{4}[-/]\d{1,2}[-/]\d{1,2}/
        separator ""
        flush_interval 5
        timeout_label @NORMAL
      </filter>

      <match **>
        @type relabel
        @label @NORMAL
      </match>
    </label>

    <label @NORMAL>
      <match **>
        @type cloudwatch_logs
        @id out_cloudwatch_logs_containers
        region "#{ENV.fetch('REGION')}"
        log_group_name "/aws/containerinsights/#{ENV.fetch('CLUSTER_NAME')}/application"
        log_stream_name_key stream_name
        remove_log_stream_name_key true
        auto_create_stream true
        <buffer>
          flush_interval 5
          chunk_limit_size 2m
          queued_chunks_limit_size 32
          retry_forever true
        </buffer>
      </match>
    </label>

现在我看到以下日志的解析器错误:

...#0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data '10.0.1.2 - - [25/Aug/2020:11:43:09 +0000] \"GET /favicon.ico HTTP/1.1\" 499 0 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/ Firefox/79.0\" 901 0.000 [develop-api-8080] [] 10.0.2.3:8080 0 0.000 - 3a3d3bbd02a633aaaab2af3b5284a0c9\n'"
..."log"=>"10.0.1.2 - - [25/Aug/2020:11:43:09 +0000] \"GET /favicon.ico HTTP/1.1\" 499 0 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0\" 901 0.000 [develop-api-8080] [] 10.0.2.3:8080 0 0.000 - 3a3d3bbd02a633aaaab2af3b5284a0c9\n"

我不确定问题出在我的正则表达式还是配置的其他部分。 (请注意,我还没有为 Nginx 应用程序日志添加解析器!)。谢谢。

【问题讨论】:

    标签: regex nginx nginx-ingress fluentd


    【解决方案1】:

    本身不是答案,因为我认为正则表达式不太正确。但是由于我可以访问 Ngnix,我只是将日志格式更改为 JSON,而不是使用 Regex 解析它:

    'log-format-upstream': '{ "app": "nginx", "time":"$time_iso8601", "remote_addr":"$remote_addr", "remote_user":"$remote_user", "forwarded_for":"$http_x_forwarded_for", "host":"$host", "res_status":"$status", "res_body_size":"$body_bytes_sent", "res_size":"$bytes_sent", "req_id":"$req_id", "req_uri":"$uri", "req_time":"$request_time", "req_proto":"$server_protocol", "req_query":"$query_string", "req_length":"$request_length", "req_method":"$request_method", "agent":"$http_user_agent", "up_name": "$proxy_upstream_name", "up_addr": "$upstream_addr", "up_res_status": "$upstream_status", "up_res_time": "$upstream_response_time", "up_res_length": "$upstream_response_length" }'
    

    【讨论】:

      猜你喜欢
      • 2017-02-06
      • 2017-05-18
      • 2018-06-19
      • 1970-01-01
      • 1970-01-01
      • 2019-10-20
      • 1970-01-01
      • 1970-01-01
      • 2020-08-07
      相关资源
      最近更新 更多