【问题标题】:Mysterious EOF error message while updating a SOLR index更新 SOLR 索引时出现神秘的 EOF 错误消息
【发布时间】:2022-03-24 01:54:55
【问题描述】:

我正在使用原子更新来更新 SOLR 文档集合中的元数据。为此,我使用了一个外部 .json 文件,其中记录了集合中的所有文档 ID 和可能的元数据,并使用“set”命令提交请求的更新。但我发现每当外部文件大于大约 8200 字节/220 行时,我都会收到以下错误消息:

“org.apache.solr.common.SolrException: 无法解析提供的 JSON: Unexpected EOF: char=(EOF),position=8191 BEFORE=''”

这似乎与文件的实际内容(或可能缺少括号或其他内容)无关,因为我使用不同的数据库复制了它。此外,如果我将外部文件切割成小于 8000 字节的小文件,则更新可以完美运行。有谁知道这可能来自哪里?

更新集合的curl命令如下:

curl 'http://localhost:8983/solr/these/update/json?commit=true' -d @test5.json

SOLR 主配置文件在发布后可用。如果需要,我可以提供 json 更新文件。我可以处理任何其他元素。

提前感谢您的帮助,

巴泰勒米

    <?xml version="1.0" encoding="UTF-8" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<!-- 
 This is a DEMO configuration highlighting elements
 specifically needed to get this example running
 such as libraries and request handler specifics.

 It uses defaults or does not define most of production-level settings
 such as various caches or auto-commit policies.

 See Solr Reference Guide and other examples for
 more details on a well configured solrconfig.xml
 https://cwiki.apache.org/confluence/display/solr/The+Well-Configured+Solr+Instance
-->

<config>
  <!-- Controls what version of Lucene various components of Solr
   adhere to.  Generally, you want to use the latest version to
   get all bug fixes and improvements. It is highly recommended
   that you fully re-index after changing this setting as it can
   affect both how text is indexed and queried.
  -->
  <luceneMatchVersion>6.6.0</luceneMatchVersion>

  <!-- Load Data Import Handler and Apache Tika (extraction) libraries -->
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar"/>
  <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar"/>
  <lib dir="${solr.install.dir:../../../..}/contrib/langid/lib" regex=".*\.jar"/>
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-.*\.jar"/>

  <requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="df">text</str>
    </lst>
  </requestHandler>

  <requestHandler name="/dataimport" class="solr.DataImportHandler">
    <lst name="defaults">
      <str name="config">tika-data-config.xml</str>
    </lst>
  </requestHandler>


  <updateRequestProcessorChain name="langid" default="true" onError = "skip">
     <processor  class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"
       onError = "continue">
       <str name="langid.fl">text</str>
       <str name="langid.langField">language_s</str>
       <str name="langid.threshold">0.8</str>
       <str name="langid.fallback">en</str>
     </processor>
     <processor class="solr.LogUpdateProcessorFactory" onError = "skip"/>
     <processor class="solr.RunUpdateProcessorFactory" onError = "skip"/>
   </updateRequestProcessorChain>

<!-- The default high-performance update handler -->
  <updateHandler class="solr.DirectUpdateHandler2">

    <!-- Enables a transaction log, used for real-time get, durability, and
         and solr cloud replica recovery.  The log can grow as big as
         uncommitted changes to the index, so use of a hard autoCommit
         is recommended (see below).
         "dir" - the target directory for transaction logs, defaults to the
                solr data directory.   -->
    <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
    </updateLog>

  </updateHandler>

</config>

【问题讨论】:

    标签: json solr updates eof


    【解决方案1】:

    尝试编辑 server/etc/jetty.xml 并调整 requestHeaderSize:

        <Set name="requestHeaderSize"><Property 
    name="solr.jetty.request.header.size" default="8192" /></Set>
    

    比你的文件限制更大的东西。

    【讨论】:

    • 不是更好,但我认为我们已经接近问题所在。出于安全原因,我发现了一篇关于 curl 上传限制的帖子。 stackoverflow.com/questions/31941213/…。它们可能是其他参数以相同方式配置?
    • 关于这个问题的另一个参考maxchadwick.xyz/blog/http-request-header-size-limits
    • 当然,如果您在请求到达 Solr(通过码头)之前达到了一些限制,您必须先解决这个问题。这可能有点棘手,在 solr 方面,参数在某些时候更改了名称等。
    • 嗯,众所周知,Apache 有 8Mb 的请求上传限制,但我不知道如何增加它。 jetty 是响应 curl 命令的 http 服务器,还是有其他中介服务可能存在这种限制?
    【解决方案2】:

    我不知道这是否会为遇到此问题的其他人解决它,但我遇到了同样的问题。

    我的初始命令如下所示:

    curl http://localhost:8983/solr/your_solr_core/update?commit=true --data-binary @test5.json -H "Content-type:application/json"
    

    更新到这个解决了问题

    curl http://localhost:8983/solr/your_solr_core/update?commit=true -H "Content-Type: application/json" -T "test5.json" -X POST
    

    显然它与 curl 使用第一个命令将整个文件加载到内存中导致问题有关,而第二个命令使用最少的内存。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-11-09
      • 1970-01-01
      • 2015-07-13
      • 2019-11-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-04
      相关资源
      最近更新 更多