【问题标题】:Logstash XML Parse FailedLogstash XML 解析失败
【发布时间】:2019-08-17 07:59:54
【问题描述】:

我正在 deviantony/docker-elk 映像上运行最新的 ELK 堆栈 6.6。我有以下 XML 文件,我尝试将其解析为 ES JSON 对象:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <ChainId>7290027600007</ChainId>
    <SubChainId>001</SubChainId>
    <StoreId>001</StoreId>
    <BikoretNo>9</BikoretNo>
    <DllVerNo>8.0.1.3</DllVerNo>
</root>

我的配置文件是:

input {
  file {
    path => "/usr/share/logstash/logs/example1.xml"
    type => "xml"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "<?xml version"
      negate => true
      what => "previous"
    }
  }
}

filter {
    xml {
        source => "message"
        store_xml => false
        xpath => [ "/root/ChainId/text()", "ChainId" ]
    }
}

output {
  elasticsearch {
    hosts => "elasticsearch:9200"
    index => "xml_index"
    manage_template => false
  }
}

我的 Logstash 输出是:

{ logstash_1 | "@timestamp" => 2019-03-26T06:45:27.941Z, logstash_1 | “标签” => [ logstash_1 | [0]“多行” logstash_1 | ], logstash_1 | "主机" => "751b3a8bf341", logstash_1 | "链ID" => [], logstash_1 | "message" => "\r\n\r\n 7290027600007\r\n 001\r\n 001\r\n 9\r\n 8.0.1.3\r\n \r", logstash_1 | "路径" => "/usr/share/logstash/logs/example1.xml", logstash_1 | "@version" => "1", logstash_1 | “类型” => “xml” logstash_1 | }

消息下的 XML 正文显示为带有转义和 \r\n 的字符串。 XPathChainId 字段返回空数组。我也尝试使用其他 XML 文件,结果相同。

更新: 在尝试删除 \r\n 后仍然没有得到 XPath 解析的字段。我的输出是:

logstash_1 | "message" => "7290027600007001001 98.0.1.3",
logstash_1 | "StoreId" => [],
logstash_1 | "BikoretNo" => [],
logstash_1 | "链ID" => [],
logstash_1 | “类型” => “xml”,
logstash_1 | “标签” => [
logstash_1 | [0]“多行”
logstash_1 | ],
logstash_1 | "@timestamp" => 2019-03-27T20:51:09.575Z,
logstash_1 | "DllVerNo" => [],
logstash_1 | "路径" => "/usr/share/logstash/logs/example1.xml",
logstash_1 | "主机" => "751b3a8bf341",
logstash_1 | "子链 ID" => [],
logstash_1 | “@版本”=>“1”
logstash_1 | }

【问题讨论】:

    标签: xml elasticsearch logstash elastic-stack


    【解决方案1】:

    请使用gsub mutate 过滤器从消息中删除特殊字符。

    mutate { 
            gsub => [ "message", "[\r\n]", "" ] 
        }
    

    将目标设置添加到用于放置数据的 xml 过滤器。

    filter {
    
        xml{
            source => "message"
            store_xml => false
            target => "root"
    
        }
    
    }
    

    这是完整的工作 logstash conf 文件。

    input
    {
        file
            {
                path => "C:\Users\KZAPAGOL\Desktop\CSV\XMLFile.xml"
                start_position => "beginning"
                sincedb_path => "/dev/null"
                exclude => "*.gz"
                type => "xml"
                codec => multiline {
                        pattern => "<?xml " 
                        negate => "true"
                        what => "previous"
                    }
            }
    }
    
    filter {
    
        xml{
            source => "message"
            store_xml => false
            target => "root"
            xpath => [
                "/root/ChainId/text()", "ChainId",
                "/root/SubChainId/text()", "SubChainId",
                "/root/StoreId/text()", "StoreId",
                "/root/BikoretNo/text()", "BikoretNo",
                "/root/DllVerNo/text()", "DllVerNo"
            ]
        }
    
        mutate { 
            gsub => [ "message", "[\r\n]", "" ] 
        }
    }
    
    output{
    
    elasticsearch{
            hosts => ["http://localhost:9200/"]
            index => "parse_xml"
        }
    
        stdout
        {
            codec => rubydebug
        }
    }
    

    输出

    {
      "_index": "parse_xml",
      "_type": "doc",
      "_id": "vNj4v2kBZ2Q_C9FO94eF",
      "_version": 1,
      "_score": null,
      "_source": {
        "@timestamp": "2019-03-27T16:25:58.379Z",
        "path": "filePath",
        "tags": [
          "multiline"
        ],
        "ChainId": [
          "7290027600007"
        ],
        "BikoretNo": [
          "9"
        ],
        "DllVerNo": [
          "8.0.1.3"
        ],
        "host": "xxxx",
        "@version": "1",
        "SubChainId": [
          "001"
        ],
        "message": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>    <ChainId>7290027600007</ChainId>    <SubChainId>001</SubChainId>    <StoreId>001</StoreId>    <BikoretNo>9</BikoretNo>    <DllVerNo>8.0.1.3</DllVerNo></root>",
        "type": "xml",
        "StoreId": [
          "001"
        ]
      },
      "fields": {
        "@timestamp": [
          "2019-03-27T16:25:58.379Z"
        ]
      },
      "sort": [
        1553703958379
      ]
    }
    

    【讨论】:

    • @wizard 如果它不适合你,请告诉我。谢谢!
    • 不幸的是没有。请检查我更新的日志输出。附言我看到你在 Windows 机器上试过这个 conf。我在 linux 上运行它(docker 和我的电脑)。
    • 你能试试这个/root/ChainId[text()]
    • 没有。 XPath 没有生效。
    【解决方案2】:

    我试过你的配置,它在windows环境下工作,它曾经和我一起发生过,我改变了xpath表达式

    尝试将 xpath 表达式更改为以下之一

    xpath => [ "//*[local-name() = 'ChainId']/text()", "ChainId" ]
    

    xpath => [ "//ChainId/text()", "ChainId" ]
    

    【讨论】:

    • @wizard 尝试将 remove_namespaces =&gt; "true" 添加到 xml 过滤器并更改 sincedb_path =&gt; "NUL"
    • 我没有看到任何变化。
    • @wizard 你会在配置更新后停止并启动logstash吗?
    • 是的。当然。我设置了--config.reload.automatic,我也尝试停止并重新启动。
    • 好吧,我不确定发生了什么变化,但它现在部分工作。即只有当我重新加载 conf 时,才会解析 xml 文件,但是当我将新的 xml 文件添加到文件夹时,它会再次返回空数组...
    【解决方案3】:

    我的 XML 文件被编码为 UTF-8 BOM 而不是 UTF-8。问题解决了!

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2011-03-28
      • 1970-01-01
      • 2011-03-19
      • 1970-01-01
      • 2017-07-18
      • 1970-01-01
      相关资源
      最近更新 更多