【发布时间】:2017-03-01 14:09:41
【问题描述】:
最近我在学习 Nutch,当我完成 Nutch 和 Solr 设置时。我想尝试用 Nutch 爬行并索引到 Solr。索引作业时发生一些错误。错误如下:
SolrIndexerJob: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Expected content type application/octet-stream but got text/html;charset=iso-8859-1.
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:455)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:146)
at org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:124)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:186)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:202)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:211)
如果有人能给我一些建议,我将不胜感激。提前致谢。
【问题讨论】:
-
当我运行命令 bin/crawl urls localhost:8983/solr 2 时,出现上述错误,但是当我将命令更改为 bin/crawl urls localhost:8983/solr/collection1 2 时,不再发生错误,但数据从 nutch 爬行,似乎没有汇集到 solr。