【问题标题】:Is it possible to keep the input Markup with DaisyDiff?是否可以使用 DaisyDiff 保留输入标记?
【发布时间】:2018-01-15 12:07:18
【问题描述】:

所以,我目前正在努力实现一个能够区分两个 HTML 文件的差异化工具。我做了一些研究,最终使用了 DaisyDiff。由于这个工具现在似乎有点陈旧,我很难找到一些仍然有效的例子。我找到了this quesion on Stackoverflow,因为我不知道要传递什么作为第三和第四个参数,这很有帮助。我目前的实施状态:

String html1 = "<html class='foobar'>Hello</html>";
String html2 = "<html>Bye</html>";

try {
    StringWriter finalResult = new StringWriter();
    SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory.newInstance(); 
    TransformerHandler result = tf.newTransformerHandler(); 
    result.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); 
    result.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes"); 
    result.getTransformer().setOutputProperty(OutputKeys.METHOD, "html"); 
    result.getTransformer().setOutputProperty(OutputKeys.ENCODING, "UTF-8"); 
    result.setResult(new StreamResult(finalResult)); 

    ContentHandler postProcess = result; 

    DaisyDiff.diffHTML(new InputSource(new StringReader(html1)), new InputSource(new StringReader(html2)), postProcess, null, Locale.GERMAN);

    System.out.println(finalResult.toString());

} catch (SAXException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

问题是,它实际上只区分纯文本,但它完全从输入中删除了标记。例如,如果我将这两个字符串作为输入:

String first = "<div>Hello</div>"
String second = "<div>Bye</div>"

我希望这个输出:

<div><span class="removed">Hello</span><span class="added">Bye</span></div>

但我只得到这个:

<span class="removed">Hello</span><span class="added">Bye</span>

【问题讨论】:

    标签: java html diff


    【解决方案1】:

    所以,我终于让它工作了。在我在 Github 上找到 this example code 之后,很明显,问题不是我怀疑的 ContentHandler。所以,如果有人还需要区分一些 HTML,并且不想浪费几天时间寻找一个好的(和工作的)示例,这就是我的工作方式。

    首先,您需要下载NekoHTML Dependency,它基本上是一个HTML Parser。

    这就是我的导入块的样子

    import java.io.IOException;
    import java.io.StringReader;
    import java.io.StringWriter;
    import java.util.Locale;
    
    import javax.xml.transform.OutputKeys;
    import javax.xml.transform.TransformerConfigurationException;
    import javax.xml.transform.sax.SAXTransformerFactory;
    import javax.xml.transform.sax.TransformerHandler;
    import javax.xml.transform.stream.StreamResult;
    
    import org.outerj.daisy.diff.helper.NekoHtmlParser;
    import org.outerj.daisy.diff.html.HTMLDiffer;
    import org.outerj.daisy.diff.html.HtmlSaxDiffOutput;
    import org.outerj.daisy.diff.html.TextNodeComparator;
    import org.outerj.daisy.diff.html.dom.DomTreeBuilder;
    import org.xml.sax.ContentHandler;
    import org.xml.sax.InputSource;
    import org.xml.sax.SAXException;
    

    这是我对 Differ 的完整实现,它不会删除实际的标记(请注意,这不是我的代码,我只是得到了上面链接的示例!):

    public static String diffHtml(String first, String second) throws TransformerConfigurationException, IOException, SAXException {
    
        StringWriter finalResult = new StringWriter();
        SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
    
        TransformerHandler result = tf.newTransformerHandler();
        result.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        result.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
        result.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
        result.getTransformer().setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        result.setResult(new StreamResult(finalResult));
    
        ContentHandler postProcess = result;
    
        Locale locale = Locale.getDefault();
        String prefix = "diff";
    
        NekoHtmlParser cleaner = new NekoHtmlParser();
    
        InputSource oldSource = new InputSource(new StringReader(first));
        InputSource newSource = new InputSource(new StringReader(second));
    
        DomTreeBuilder oldHandler = new DomTreeBuilder();
        cleaner.parse(oldSource, oldHandler);
        TextNodeComparator leftComparator = new TextNodeComparator(oldHandler, locale);
    
        DomTreeBuilder newHandler = new DomTreeBuilder();
        cleaner.parse(newSource, newHandler);
        TextNodeComparator rightComparator = new TextNodeComparator(newHandler, locale);
    
        HtmlSaxDiffOutput output = new HtmlSaxDiffOutput(postProcess, prefix);
    
        HTMLDiffer differ = new HTMLDiffer(output);
        differ.diff(leftComparator, rightComparator);
    
        System.out.println(finalResult.toString());
    
        return finalResult.toString();
    }
    

    哦,如果您在使用 IProgressMonitor 接口时遇到错误,请注意,它已从 org.eclipse.core.runtime 移动到 org.eclipse.equinox.common,因此请记住使用正确的依赖关系。也偶然发现了这个小问题。我希望这会有所帮助!

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-03-15
      • 1970-01-01
      • 1970-01-01
      • 2013-09-16
      • 1970-01-01
      • 2020-07-10
      • 2015-04-01
      • 1970-01-01
      相关资源
      最近更新 更多