【问题标题】:XML To CSV Conversion JavaXML 到 CSV 转换 Java
【发布时间】:2016-09-27 09:56:09
【问题描述】:

我正在将 XML 转换为 CSV 数据。通过查看各种示例,我能够编写用于解析 XML 文件和获取 CSV 文件的代码。但是,我编写的代码返回的 CSV 文件并未显示 XML 文件中存在的所有标签。

我有用于转换的 XSLT。我是 XSLT 的新手,所以我相信我的 XSLT 有问题。

这里是 Java 代码:

package com.adarsh.conversions;

import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.w3c.dom.Document;

    class XMLtoCsVConversion {

        public static void main(String args[]) throws Exception {
            File stylesheet = new File("style.xsl");
            File xmlSource = new File("sample_data.xml");

            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document document = builder.parse(xmlSource);

            StreamSource stylesource = new StreamSource(stylesheet);
            Transformer transformer = TransformerFactory.newInstance()
                    .newTransformer(stylesource);
            Source source = new DOMSource(document);
            Result outputTarget = new StreamResult(new File("/tmp/x.csv"));
            transformer.transform(source, outputTarget);
        }
    }

这是我正在使用的 XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:for-each select="*/*[1]/*">
      <xsl:value-of select="name()" />
      <xsl:if test="not(position() = last())">,</xsl:if>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text>
    <xsl:apply-templates select="*/*" mode="row"/>
  </xsl:template>

  <xsl:template match="*" mode="row">
    <xsl:apply-templates select="*" mode="data" />
    <xsl:text>&#10;</xsl:text>
  </xsl:template>

  <xsl:template match="*" mode="data">
    <xsl:choose>
      <xsl:when test="contains(text(),',')">
        <xsl:text>&quot;</xsl:text>
        <xsl:call-template name="doublequotes">
          <xsl:with-param name="text" select="text()" />
        </xsl:call-template>
        <xsl:text>&quot;</xsl:text>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="." />
      </xsl:otherwise>
    </xsl:choose>
    <xsl:if test="position() != last()">,</xsl:if>
  </xsl:template>

  <xsl:template name="doublequotes">
    <xsl:param name="text" />
    <xsl:choose>
      <xsl:when test="contains($text,'&quot;')">
        <xsl:value-of select="concat(substring-before($text,'&quot;'),'&quot;&quot;')" />
        <xsl:call-template name="doublequotes">
          <xsl:with-param name="text" select="substring-after($text,'&quot;')" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

这是我要转换为 CSV 的 XML 文件:

<?xml version="1.0"?>

<school id="100" name="WGen School">

    <grade id="1">
        <classroom id="101" name="Mrs. Jones' Math Class">
            <teacher id="10100000001" first_name="Barbara" last_name="Jones"/>

            <student id="10100000010" first_name="Michael" last_name="Gil"/>
            <student id="10100000011" first_name="Kimberly" last_name="Gutierrez"/>
            <student id="10100000013" first_name="Toby" last_name="Mercado"/>
            <student id="10100000014" first_name="Lizzie" last_name="Garcia"/>
            <student id="10100000015" first_name="Alex" last_name="Cruz"/>
        </classroom>


        <classroom id="102" name="Mr. Smith's PhysEd Class">
            <teacher id="10200000001" first_name="Arthur" last_name="Smith"/>
            <teacher id="10200000011" first_name="John" last_name="Patterson"/>

            <student id="10200000010" first_name="Nathaniel" last_name="Smith"/>
            <student id="10200000011" first_name="Brandon" last_name="McCrancy"/>
            <student id="10200000012" first_name="Elizabeth" last_name="Marco"/>
            <student id="10200000013" first_name="Erica" last_name="Lanni"/>
            <student id="10200000014" first_name="Michael" last_name="Flores"/>
            <student id="10200000015" first_name="Jasmin" last_name="Hill"/>
            <student id="10200000016" first_name="Brittany" last_name="Perez"/>
            <student id="10200000017" first_name="William" last_name="Hiram"/>
            <student id="10200000018" first_name="Alexis" last_name="Reginald"/>
            <student id="10200000019" first_name="Matthew" last_name="Gayle"/>
        </classroom>

        <classroom id="103" name="Brian's Homeroom">
            <teacher id="10300000001" first_name="Brian" last_name="O'Donnell"/>
        </classroom>
    </grade>
</school>

预期结果是:

classroom id, classroom_name, teacher_1_id, teacher_1_last_name, teacher_1_first_name, teacher_2_id, teacher_2_last_name, teacher_2_first_name, student_id, student_last_name, student_first_name, grade
101, Mrs. Jones' Math Class, 10100000001, Jones, Barbara, , , , 10100000010, Gil, Michael, 2
101, Mrs. Jones' Math Class, 10100000001, Jones, Barbara, , , , 10100000011, Gutierrez, Kimberly, 2
101, Mrs. Jones' Math Class, 10100000001, Jones, Barbara, , , , 10100000013, Mercado, Toby, 1
101, Mrs. Jones' Math Class, 10100000001, Jones, Barbara, , , , 10100000014, Garcia, Lizzie, 1
101, Mrs. Jones' Math Class, 10100000001, Jones, Barbara, , , , 10100000015, Cruz, Alex, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000010, Smith, Nathaniel, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000011, McCrancy, Brandon, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000012, Marco, Elizabeth, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000013, Lanni, Erica, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000014, Flores, Michael, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000015, Hill, Jasmin, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000016, Perez, Brittany, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000017, Hiram, William, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000018, Reginald, Alexis, 1
102, Mr. Smith's PhysEd Class, 10200000001, Smith, Arthur, 10200000011, Patterson, John, 10200000019, Gayle, Matthew, 1
103, Brian's Homeroom, 10300000001, O'Donnell, Brian, , , , , , ,

但是我只是得到了

教室教室教室

有人可以帮我解决这个问题吗?

附:我已经在 stackoverflow 上提到了关于 CSV 到 XML 转换的其他问题。我已使用这些帖子中提供的信息来帮助我创建 XSL。

【问题讨论】:

  • 您的预期输出不可读。请将其发布为代码格式。
  • 该输出不是 CSV(逗号分隔值),而是制表符分隔值。我对其进行了更新,以直观地显示制表符,因此我们对正在发生的事情有一个线索。我相信你已经换好了线,因为那看起来还是不对。
  • @michael.hor257k:谢谢你指出这一点。我已经在 Excel 中构建了预期的格式,所以它没有逗号。对此感到抱歉。我现在已经编辑了这个问题,以反映一个正确的 CSV 文件,每个值用逗号分隔。
  • @AdarshBhat 您的预期输出没有引用值。您的输入可以包含逗号吗?如果是,在哪些领域?另外,我看不出student_grade 应该来自哪里。
  • @michael.hor257k 否 我的输入不包含逗号。此外,student_grade 指的是 XML 中的等级标签。我已将预期输出更改为仅反映 grade 而不是 student_grade

标签: java xml csv xslt


【解决方案1】:

我建议你以此为起点:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>

<xsl:template match="/school">
    <!-- header -->
    <xsl:text>classroom id,classroom_name,teacher_1_id,teacher_1_last_name,teacher_1_first_name,teacher_2_id,teacher_2_last_name,teacher_2_first_name,student_id,student_last_name,student_first_name,grade&#10;</xsl:text>
    <!-- data -->
    <xsl:for-each select="grade/classroom">
        <!-- classroom data -->
        <xsl:variable name="classroom-data">
            <xsl:value-of select="@id" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="@name" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="teacher[1]/@id" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="teacher[1]/@last_name" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="teacher[1]/@first_name" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="teacher[2]/@id" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="teacher[2]/@last_name" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="teacher[2]/@first_name" />
            <xsl:text>,</xsl:text>
        </xsl:variable>
        <xsl:variable name="grade-id" select="../@id" />
        <xsl:for-each select="student">
            <xsl:copy-of select="$classroom-data"/>
            <!-- student data -->
            <xsl:value-of select="@id" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="@last_name" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="@first_name" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="$grade-id" />
            <xsl:text>&#10;</xsl:text>
        </xsl:for-each>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

应用于您的输入,结果将是:

classroom id,classroom_name,teacher_1_id,teacher_1_last_name,teacher_1_first_name,teacher_2_id,teacher_2_last_name,teacher_2_first_name,student_id,student_last_name,student_first_name,grade
101,Mrs. Jones' Math Class,10100000001,Jones,Barbara,,,,10100000010,Gil,Michael,1
101,Mrs. Jones' Math Class,10100000001,Jones,Barbara,,,,10100000011,Gutierrez,Kimberly,1
101,Mrs. Jones' Math Class,10100000001,Jones,Barbara,,,,10100000013,Mercado,Toby,1
101,Mrs. Jones' Math Class,10100000001,Jones,Barbara,,,,10100000014,Garcia,Lizzie,1
101,Mrs. Jones' Math Class,10100000001,Jones,Barbara,,,,10100000015,Cruz,Alex,1
102,Mr. Smith's PhysEd Class,10200000001,Smith,Arthur,10200000011,Patterson,John,10200000010,Smith,Nathaniel,1
102,Mr. Smith's PhysEd Class,10200000001,Smith,Arthur,10200000011,Patterson,John,10200000011,McCrancy,Brandon,1
102,Mr. Smith's PhysEd Class,10200000001,Smith,Arthur,10200000011,Patterson,John,10200000012,Marco,Elizabeth,1
102,Mr. Smith's PhysEd Class,10200000001,Smith,Arthur,10200000011,Patterson,John,10200000013,Lanni,Erica,1
102,Mr. Smith's PhysEd Class,10200000001,Smith,Arthur,10200000011,Patterson,John,10200000014,Flores,Michael,1
102,Mr. Smith's PhysEd Class,10200000001,Smith,Arthur,10200000011,Patterson,John,10200000015,Hill,Jasmin,1
102,Mr. Smith's PhysEd Class,10200000001,Smith,Arthur,10200000011,Patterson,John,10200000016,Perez,Brittany,1
102,Mr. Smith's PhysEd Class,10200000001,Smith,Arthur,10200000011,Patterson,John,10200000017,Hiram,William,1
102,Mr. Smith's PhysEd Class,10200000001,Smith,Arthur,10200000011,Patterson,John,10200000018,Reginald,Alexis,1
102,Mr. Smith's PhysEd Class,10200000001,Smith,Arthur,10200000011,Patterson,John,10200000019,Gayle,Matthew,1

请注意,这假设您的输入字段不包含逗号或双引号。

【讨论】:

  • 非常感谢您的帮助 :) 我将努力使用 XSLT 改善自己。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-01-25
  • 1970-01-01
  • 1970-01-01
  • 2011-12-31
相关资源
最近更新 更多