【问题标题】:XPath code creates IOExceptionXPath 代码创建 IOException
【发布时间】:2015-09-26 20:54:26
【问题描述】:

我有一个文本文档,其中每一行都是一个完整的美国专利 XML 文档。我试图解析它以删除某些功能,如专利号等。我以前没有使用过 XPath,所以我借用了一些从 Ravi Thapliyal 找到的代码 在Parse XML Simple String using Java XPath。但是,显然最初的 !DOCTYPE 标记导致 DocumentBuilder 尝试在某处找到实际文档?

这是我第一次尝试代码:

//convert entire file to ArrayList of strings
        ArrayList<String> doc = new ArrayList<>();
        while(input.hasNext()){
            doc.add(input.nextLine().trim());
        }

int index = 0;
    while(index < doc.size()){
        String xml = doc.get(index);
        XPathFactory xpathFactory = XPathFactory.newInstance();
        XPath xPath = xpathFactory.newXPath();
        InputSource source = new InputSource(new StringReader(xml));

        db.setEntityResolver(new EntityResolver() {
            public InputSource resolveEntity(String publicId, String systemId)
             throws SAXException, java.io.IOException {
                if (systemId.contains("us-patent-grant-v40-2004-12-02.dtd")) {
            return new InputSource(new StringReader(""));
        } else {
            return null;
        }
            }
        });

        String orgName = "";
        try {
            orgName = (String) xPath.evaluate("/agents/adressbook/orgname", source,XPathConstants.STRING);
        } catch (Exception e) {
            e.printStackTrace();
        }

        System.out.println("Document #" + index + " Company: " + orgName);
    }//end while loop that goes through each line (patent document) in file

输入文件中每一行的开头在 DOCTYPE 声明之后包含以下内容: 美国专利授权系统“美国专利授权-v40-2004-12-02.dtd”[]>

导致问题的行 (91) 是:

orgName = (String) xPath.evaluate("/agents/adressbook/orgname", 
       source,XPathConstants.STRING);

堆栈跟踪是:

java.io.FileNotFoundException: C:\Users\Dave\Documents\NetBeansProjects\ParseXML\us-patent-grant-v40-2004-12-02.dtd (The system cannot find the file specified)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:131)
    at java.io.FileInputStream.<init>(FileInputStream.java:87)
    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:616)
Document #0 Company: 
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1293)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1260)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:263)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1164)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1050)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:938)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
    at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:466)
    at Parser.main(Parser.java:102)
--------------- linked to ------------------
javax.xml.xpath.XPathExpressionException: java.io.FileNotFoundException: C:\Users\Dave\Documents\NetBeansProjects\ParseXML\us-patent-grant-v40-2004-12-02.dtd (The system cannot find the file specified)
    at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:473)
    at Parser.main(Parser.java:102)
Caused by: java.io.FileNotFoundException: C:\Users\Dave\Documents\NetBeansProjects\ParseXML\us-patent-grant-v40-2004-12-02.dtd (The system cannot find the file specified)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:131)
    at java.io.FileInputStream.<init>(FileInputStream.java:87)
    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:616)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1293)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1260)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:263)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1164)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1050)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:938)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
    at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:466)

谁能帮我弄清楚我应该怎么做才能解析字符串中的文档?

【问题讨论】:

    标签: java xml xpath xml-parsing


    【解决方案1】:

    尝试设置功能或提供空的EntityResolver

    对于您需要找到您使用的解析器实现的功能(它们是特定于实现的)

    Make DocumentBuilder.parse ignore DTD references

    【讨论】:

    • 我已经尝试过了,但仍然得到同样的错误。我已更改原始问题以按要求显示新代码和堆栈跟踪。谢谢。
    • 您是否尝试过链接中的 builder.setEntityResolver 代码?
    • 是的,我刚刚尝试过,得到了完全相同的堆栈跟踪。
    • 好的,使用 setEntityResolver 放入堆栈跟踪和代码。我的时间不多了,所以我可能不得不完全尝试另一种方法。谢谢,伙计们。
    • 试试 "xml.org/sax/features/use-entity-resolver2"=false 和 setEntityResolver
    【解决方案2】:

    您是否尝试过提供它试图引用的 DTD 文件,例如从us-patent-application-v40-2004-12-02.dtd下载?

    您可以尝试将此文件放在与 XML 相同的文件夹中;或者在解析过程的当前目录中(因为你很着急,所以尝试两者)。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-02
      • 1970-01-01
      • 2016-08-31
      • 2014-03-26
      相关资源
      最近更新 更多