【问题标题】:StormCrawler /Elastic Search Apache Tika for parsing PDF's. Getting error when running topologyStormCrawler /Elastic Search Apache Tika 用于解析 PDF。运行拓扑时出错
【发布时间】:2023-04-05 23:48:02
【问题描述】:

运行 es-crawler.flux 拓扑时出现以下错误。我不确定我做错了什么。我认为没有 yaml 错误?


  **I added the Apache Tika module as an dependency in the pom.xml. file** 


           <!-- Add tika dependency  -->
              <dependency>
                      <groupId>com.digitalpebble.stormcrawler</groupId>
                      <artifactId>storm-crawler-tika</artifactId>
                      <version>${stormcrawler.version}</version>
              </dependency>

更新了此处引用的 es-crawler.flux 文件* https://gist.github.com/jnioche/3f09c2e3f7da845181b733253bc806f1

我运行了拓扑
**Got the following results.**

线程“main”中的异常无法为 JavaBean=org.apache.storm.flux.model.TopologyDef@65e98b1c in 'string',第 1 行,第 1 列创建属性 = 流: 名称:“ devcrawler" ^Cannot create property=grouping for JavaBean=org.apache.storm.flux.model.StreamDef@1ff4931d in 'string', line 94, column 5: - from: "shunt"` `` ^

```    in 'string', line 97, column 7:
         type: LOCAL_OR_SHUFFLE
         ^
    Unable to find property 'streamid' on class: org.apache.storm.flux.model.GroupingDef
    in 'string', line 98, column 17:
         streamid: "tika"

 ```   Unable to find property 'streamid' on class: org.apache.storm.flux.model.GroupingDef
    in 'string', line 98, column 17:
         streamid: "tika"
                   ^

   in 'string', line 97, column 7:
         type: LOCAL_OR_SHUFFLE
         ^

     in 'string', line 63, column 3:
     - from: "spout"
     ^

      at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:292)
       at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:171)
       at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:331)
       at org.yaml.snakeyaml.constructor.BaseConstructor.constructObjectNoCheck(BaseConstructor.java:230)
       at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:219)
       at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:173)
       at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:157)
       at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:472)
       at org.yaml.snakeyaml.Yaml.load(Yaml.java:398)
       at org.apache.storm.flux.parser.FluxParser.loadYaml(FluxParser.java:168)
       at org.apache.storm.flux.parser.FluxParser.parseInputStream(FluxParser.java:114)
       at org.apache.storm.flux.parser.FluxParser.parseFile(FluxParser.java:68)
       at org.apache.storm.flux.Flux.runCli(Flux.java:167)
       at org.apache.storm.flux.Flux.main(Flux.java:119)```
      Caused by: Cannot create property=grouping for JavaBean=org.apache.storm.flux.model.StreamDef@1ff4931d ```
      in 'string', line 94, column 5:
     - from: "shunt" 
       ^ 
     Cannot create property=streamid for JavaBean=org.apache.storm.flux.model.GroupingDef@710f4dc7
     in 'string', line 97, column 7:
         type: LOCAL_OR_SHUFFLE
         ^
      Unable to find property 'streamid' on class: org.apache.storm.flux.model.GroupingDef ```
      in 'string', line 98, column 17:
         streamid: "tika"
                   

  in 'string', line 97, column 7: ```
         type: LOCAL_OR_SHUFFLE
    ``` 

【问题讨论】:

    标签: maven elasticsearch apache-tika stormcrawler


    【解决方案1】:

    我从上面的 Gist 复制了 Flux 文件,它运行没有问题。也许您的文件中的行对齐不正确(例如缺少空格)?

    【讨论】:

    • 是的,这是对齐问题!非常感谢您在我进行了适当的更改后成功运行。谢谢
    猜你喜欢
    • 2018-10-21
    • 1970-01-01
    • 1970-01-01
    • 2016-01-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多