【问题标题】:add latex type equation in word (.docx) using Apache POI使用 Apache POI 在 word (.docx) 中添加乳胶类型方程
【发布时间】:2017-10-07 18:16:17
【问题描述】:

我正在尝试使用 Apache POI 创建一个自动 (.docx) MS Word 文件。 Java 程序的输入包含文本、图像和 LaTeX 风格的方程式(嵌入在 $$ 或 [ ] 中)。

我的问题是如何在 Word 中添加这个 LaTeX 样式方程,以便在 MS Word 中编辑 .docx 文件时,它会将方程识别为 MS Word 样式方程(OMML 类型)

注意:我认为应该将 LaTeX 方程转换为 MathML。如果是这样,那么如何将 MathML 添加到 .docx 中?

【问题讨论】:

    标签: java ms-word apache-poi


    【解决方案1】:

    Microsoft 提供 XSLT 样式表,用于将 OMML 转换为 MathML (OMML2MML.XSL) 以及使用 XSLT 将 MathML 转换为 OMML (MML2OMML.XSL)。

    如果您安装了Microsoft Office,您将在Office 程序目录中找到这些文件。在我的系统中:

    使用它,我们可以使用 XSLT 将 MathML 转换为 OMML。

    例子:

    import java.io.*;
    import org.apache.poi.xwpf.usermodel.*;
    
    import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
    import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
    import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;
    import org.openxmlformats.schemas.officeDocument.x2006.math.CTR;
    
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.stream.StreamSource;
    import javax.xml.transform.stream.StreamResult;
    
    import org.apache.xmlbeans.XmlCursor;
    
    /*
    needs the full ooxml-schemas-*.jar or poi-ooxml-full-5.0.0.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025
    */
    
    public class CreateWordFormulaFromMathML {
    
     static File stylesheet = new File("MML2OMML.XSL");
     static TransformerFactory tFactory = TransformerFactory.newInstance();
     static StreamSource stylesource = new StreamSource(stylesheet); 
    
     static CTOMath getOMML(String mathML) throws Exception {
      Transformer transformer = tFactory.newTransformer(stylesource);
    
      StringReader stringreader = new StringReader(mathML);
      StreamSource source = new StreamSource(stringreader);
    
      StringWriter stringwriter = new StringWriter();
      StreamResult result = new StreamResult(stringwriter);
      transformer.transform(source, result);
    
      String ooML = stringwriter.toString();
      stringwriter.close();
    
      CTOMathPara ctOMathPara = CTOMathPara.Factory.parse(ooML);
      CTOMath ctOMath = ctOMathPara.getOMathArray(0);
    
      //for making this to work with Office 2007 Word also, special font settings are necessary
      XmlCursor xmlcursor = ctOMath.newCursor();
      while (xmlcursor.hasNextToken()) {
       XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
       if (tokentype.isStart()) {
        if (xmlcursor.getObject() instanceof CTR) {
         CTR cTR = (CTR)xmlcursor.getObject();
         cTR.addNewRPr2().addNewRFonts().setAscii("Cambria Math");
         cTR.getRPr2().getRFonts().setHAnsi("Cambria Math"); // up to apache poi 4.1.2
         //cTR.getRPr2().getRFontsArray(0).setHAnsi("Cambria Math"); // since apache poi 5.0.0
        }
       }
      }
    
      return ctOMath;
     }
    
     public static void main(String[] args) throws Exception {
    
      XWPFDocument document = new XWPFDocument();
    
      XWPFParagraph paragraph = document.createParagraph();
      XWPFRun run = paragraph.createRun();
      run.setText("The Pythagorean theorem: ");
    
      String mathML = 
        "<math xmlns=\"http://www.w3.org/1998/Math/MathML\">" 
       +"<mrow>"
       +"<msup><mi>a</mi><mn>2</mn></msup><mo>+</mo><msup><mi>b</mi><mn>2</mn></msup><mo>=</mo><msup><mi>c</mi><mn>2</mn></msup>"
       +"</mrow>"
       +"</math>";
    
      CTOMath ctOMath = getOMML(mathML);
    System.out.println(ctOMath);
    
      CTP ctp = paragraph.getCTP();
      ctp.setOMathArray(new CTOMath[]{ctOMath});
    
      paragraph = document.createParagraph();
      run = paragraph.createRun();
      run.setText("The Quadratic Formula: ");
    
      mathML = 
        "<math xmlns=\"http://www.w3.org/1998/Math/MathML\">"
       +"<mrow>" 
       +"<mi>x</mi><mo>=</mo><mfrac><mrow><mrow><mo>-</mo><mi>b</mi></mrow><mo>±</mo><msqrt><mrow><msup><mi>b</mi><mn>2</mn></msup><mo>-</mo><mrow><mn>4</mn><mo>⁢</mo><mi>a</mi><mo>⁢</mo><mi>c</mi></mrow></mrow></msqrt></mrow><mrow><mn>2</mn><mo>⁢</mo><mi>a</mi></mrow></mfrac>"
       +"</mrow>"
       +"</math>";
    
      ctOMath = getOMML(mathML);
    System.out.println(ctOMath);
    
      ctp = paragraph.getCTP();
      ctp.setOMathArray(new CTOMath[]{ctOMath});
      
      FileOutputStream out = new FileOutputStream("CreateWordFormulaFromMathML.docx");
      document.write(out);
      out.close();
      document.close();
    
     }
    }
    

    请注意,此代码需要完整的ooxml-schemas-*.jarpoi-ooxml-full-5.0.0.jar,如https://poi.apache.org/faq.html#faq-N10025 中所述。


    当然有 Java 库可用于将 LaTeX 转换为 MathML。例如:http://www.fmath.info/java/download.jsp

    已下载:fmath-mathml-java-test-project-b1124.zip 并在类路径中有 /lib/fmath-mathml-java.jar/lib/jdom-2.0.6.jar,以下工作:

    import java.io.*;
    import org.apache.poi.xwpf.usermodel.*;
    
    import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
    import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
    import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;
    import org.openxmlformats.schemas.officeDocument.x2006.math.CTR;
    
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.stream.StreamSource;
    import javax.xml.transform.stream.StreamResult;
    
    import org.apache.xmlbeans.XmlCursor;
    
    /*
    needs the full ooxml-schemas-1.3.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025
    */
    
    public class CreateWordFormulaFromLaTeX {
    
     static File stylesheet = new File("MML2OMML.XSL");
     static TransformerFactory tFactory = TransformerFactory.newInstance();
     static StreamSource stylesource = new StreamSource(stylesheet); 
    
     static CTOMath getOMML(String mathML) throws Exception {
      Transformer transformer = tFactory.newTransformer(stylesource);
    
      StringReader stringreader = new StringReader(mathML);
      StreamSource source = new StreamSource(stringreader);
    
      StringWriter stringwriter = new StringWriter();
      StreamResult result = new StreamResult(stringwriter);
      transformer.transform(source, result);
    
      String ooML = stringwriter.toString();
      stringwriter.close();
    
      CTOMathPara ctOMathPara = CTOMathPara.Factory.parse(ooML);
      CTOMath ctOMath = ctOMathPara.getOMathArray(0);
    
      //for making this to work with Office 2007 Word also, special font settings are necessary
      XmlCursor xmlcursor = ctOMath.newCursor();
      while (xmlcursor.hasNextToken()) {
       XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
       if (tokentype.isStart()) {
        if (xmlcursor.getObject() instanceof CTR) {
         CTR cTR = (CTR)xmlcursor.getObject();
         cTR.addNewRPr2().addNewRFonts().setAscii("Cambria Math");
         cTR.getRPr2().getRFonts().setHAnsi("Cambria Math");
        }
       }
      }
    
      return ctOMath;
     }
    
     public static void main(String[] args) throws Exception {
    
      XWPFDocument document = new XWPFDocument();
    
      XWPFParagraph paragraph = document.createParagraph();
      XWPFRun run = paragraph.createRun();
      run.setText("The Pythagorean theorem: ");
    
      String latex = "$a^2 + b^2 = c^2$";
    
      String mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
      mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
    System.out.println(mathML);
    
      CTOMath ctOMath = getOMML(mathML);
    System.out.println(ctOMath);
    
      CTP ctp = paragraph.getCTP();
      ctp.setOMathArray(new CTOMath[]{ctOMath});
    
    
      paragraph = document.createParagraph();
      run = paragraph.createRun();
      run.setText("The Quadratic Formula: ");
    
      latex = "$x=\\frac{-b\\pm\\sqrt{b^2-4ac}}{2a}$";
    
      mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
      mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
      mathML = mathML.replaceAll("&plusmn;", "±");
    System.out.println(mathML);
    
      ctOMath = getOMML(mathML);
    System.out.println(ctOMath);
    
      ctp = paragraph.getCTP();
      ctp.setOMathArray(new CTOMath[]{ctOMath});
    
      document.write(new FileOutputStream("CreateWordFormulaFromLaTeX.docx"));
      document.close();
    
     }
    }
    

    但每次转换都包含可能的错误。所以 LaTeX -> MathML -> OMML 将比仅 MathML -> OMML 更容易出错。

    在这种情况下,fmath.conversion.ConvertFromLatexToMathML.convertToMathML 会导致 Math XML 没有名称空间。但是由于XSLT需要这个,所以必须手动添加。

    fmath.conversion.ConvertFromLatexToMathML.convertToMathML 使用HTML 实体,MML2OMML.XSL 不知道。所以示例中的“±”必须替换为“±”。


    也许SnuggleTeX 会是更好的库?

    下载它并在类路径中有snuggletex-core-1.2.2.jar,我在上一个示例中的以下代码更改有效:

    ...
      String latex = "$a^2 + b^2 = c^2$";
    
      uk.ac.ed.ph.snuggletex.SnuggleEngine engine = new uk.ac.ed.ph.snuggletex.SnuggleEngine();
      uk.ac.ed.ph.snuggletex.SnuggleSession session = engine.createSession();
      uk.ac.ed.ph.snuggletex.SnuggleInput input = new uk.ac.ed.ph.snuggletex.SnuggleInput(latex);
      session.parseInput(input);
      String mathML = session.buildXMLString();
    System.out.println(mathML);
    
    /*
      String mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
      mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
    System.out.println(mathML);
    */
    
      CTOMath ctOMath = getOMML(mathML);
    System.out.println(ctOMath);
    
    ...
    
      latex = "$x=\\frac{-b\\pm\\sqrt{b^2-4ac}}{2a}$";
    
      engine = new uk.ac.ed.ph.snuggletex.SnuggleEngine();
      session = engine.createSession();
      input = new uk.ac.ed.ph.snuggletex.SnuggleInput(latex);
      session.parseInput(input);
      mathML = session.buildXMLString();
    System.out.println(mathML);
    
    /*
      mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
      mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
      mathML = mathML.replaceAll("&plusmn;", "±");
    System.out.println(mathML);
    */
    
      ctOMath = getOMML(mathML);
    System.out.println(ctOMath);
    ...
    

    无需手动干预。至少不使用给定的 LaTeX 示例。

    【讨论】:

    • 谢谢!似乎工作得很好。有没有办法通过java将LaTeX风格的方程转换为MathML?
    • 为了使它也可以与 Office 2007 Word 一起使用,需要特殊的字体设置,请参阅我的代码补充。对于您关于通过 java 将 LaTeX 样式方程转换为 MathML 的任何方法的问题:您对此的研究结果是什么?当然有任何方法,但每次转换都包含可能的错误。所以 LaTeX -> MathML -> OMML 将比仅 MathML -> OMML 更容易出错。所以自己做试验/错误。
    • @Axel Richter 以 Latex 风格提供输入要容易得多。另外,我只发现 MathJax 能够将 Latex 转换为 MathML。但它是一个 Web 应用程序,而不是 java。
    • @ashwin bande:当然有 Java 库可用于将 LaTeX 转换为 MathML。我目前发现SnuggleTeX 是最好的。看我的补充。但你必须自己测试。
    • SnuggleTex 是完美的解决方案。甚至比 fmath 还要好。再次感谢!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-03-29
    • 2011-04-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多