使用或不使用 Logstash 过滤 Filebeat 输入答案

【问题标题】：Filtering Filebeat input with or without Logstash使用或不使用 Logstash 过滤 Filebeat 输入
【发布时间】：2020-04-17 19:31:14
【问题描述】：

在我们当前的设置中，我们使用 Filebeat 将日志传送到 Elasticsearch 实例。应用程序日志为 JSON 格式，并在 AWS 中运行。

出于某种原因，AWS 决定在新平台版本中为日志行添加前缀，现在日志解析不起作用。

Apr 17 06:33:32 ip-172-31-35-113 web: {"@timestamp":"2020-04-17T06:33:32.691Z","@version":"1","message":"Tomcat started on port(s): 5000 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}

以前只是：

{"@timestamp":"2020-04-17T06:33:32.691Z","@version":"1","message":"Tomcat started on port(s): 5000 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}

问题是我们是否可以避免使用 Logstash 将日志行转换为旧格式？如果没有，我该如何删除前缀？哪种过滤器是最好的选择？

我当前的 Filebeat 配置如下所示：

 filebeat.inputs:
  - type: log
    paths:
    - /var/log/web-1.log
    json.keys_under_root: true
    json.ignore_decoding_error: true
    json.overwrite_keys: true
    fields_under_root: true
    fields:
      environment: ${ENV_NAME:not_set}
      app: myapp

  cloud.id: "${ELASTIC_CLOUD_ID:not_set}"
  cloud.auth: "${ELASTIC_CLOUD_AUTH:not_set}"

【问题讨论】：

标签： elasticsearch logstash amazon-elastic-beanstalk filebeat

【解决方案1】：

我会尝试利用dissect 和decode_json_fields 处理器：

processors:
  # first ignore the preamble and only keep the JSON data
  - dissect:
      tokenizer: "%{?ignore} %{+ignore} %{+ignore} %{+ignore} %{+ignore}: %{json}"
      field: "message"
      target_prefix: ""

  # then parse the JSON data
  - decode_json_fields:
      fields: ["json"]
      process_array: false
      max_depth: 1
      target: ""
      overwrite_keys: false
      add_error_key: true

【讨论】：

我还没有测试过这个解决方案，但从我的脑海中，它应该可以工作。你试过了吗？
现在就做，我应该把它包含在输入下还是全局？我想应该没关系。
... 类型："log"，Meta:map[string]string(nil)，FileStateOS:file.StateOS{Inode:0xc009ce，Device:0xca01}}，TimeSeries:false}，Flags :0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"mapper_parsing_exception","reason":"object mapping for [json] 试图解析字段 [ json] 作为对象，但找到了一个具体的值"}
这是因为您的索引可能已经有一个名为 json 的字符串类型字段，可能使用另一个名称，例如 json_tmp 或其他任何名称

【解决方案2】：

Logstash 中有一个名为JSON filter 的插件，它包括一个名为“消息”的字段中的所有原始日志行（例如）。

filter {
    json {
        source => "message"
    }
}

如果您不想包含该行的开头部分，请在 Logstash 中使用 dissect filter。应该是这样的：

filter {
    dissect {
        mapping => {
            "message" => "%{}: %{message_without_prefix}"
         }
    }
}

也许在 Filebeat 中也有这两个功能。但根据我的经验，我更喜欢在解析/操作日志数据时使用 Logstash。

【讨论】：