【问题标题】:Extracting Schema Information from XML to a data.frame将模式信息从 XML 提取到 data.frame
【发布时间】:2018-07-08 16:46:01
【问题描述】:

我正在尝试解析一些包含架构信息和数据的 XML 结果集(从通过 SOAP API 传递的 SQL 查询生成)。

我已经设法使用XML 包获取数据,但我很难将架构信息提取到R 环境中。

示例 XML 和行提取

library(magrittr)
library(XML)

## Example XML to parse
file <- '<?xml version="1.0"?>
<rowset xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:saw-sql="urn:saw-sql" targetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset">
    <xsd:complexType name="Row">
      <xsd:sequence>
        <xsd:element name="Column0" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:displayFormula="Description"/>
        <xsd:element name="Column1" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="numeric" saw-sql:displayFormula="Number"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:schema>
  <Row>
    <Column0>foo</Column0>
    <Column1>1.2</Column1>
  </Row>
  <Row>
    <Column0>bar</Column0>
    <Column1>2.3</Column1>
  </Row>
</rowset>
'
## Extract the rows    
file %>% 
  XML::xmlParse() %>% 
  XML::xmlRoot() %>%
  XML::xmlElementsByTagName(.,"Row",TRUE) %>% 
  xmlToDataFrame() -> DF

print(DF)

返回以下内容

  Column0 Column1
1     foo     1.2
2     bar     2.3

尝试的架构提取

理想情况下,我想提取带有列信息的第二个数据框,以便我可以使用它来正确格式化我的结果集。但是,我能得到的最远的是元素列表。据我了解,这些存储为外部指针,我一直在努力将它们拉回 R 环境。

file %>% 
  XML::xmlParse() %>% 
  XML::xmlRoot() %>%
  XML::xmlElementsByTagName(.,"element",TRUE) 

产生

$schema.complexType.sequence.element
<xsd:element name="Column0" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:displayFormula="Description"/> 

$schema.complexType.sequence.element
<xsd:element name="Column1" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="numeric" saw-sql:displayFormula="Number"/> 

所需的输出

我真正想要的是以下内容:

     name       type minOccurs maxOccurs saw.sql.type saw.sql.displayFormula
1 Column0 xsd:string         0         1      varchar            Description
2 Column1 xsd:string         0         1      numeric                 Number

(输出示例生成)

data.frame(name = c("Column0","Column1"),
           type = "xsd:string",
           minOccurs = "0",
           maxOccurs="1",
           `saw-sql:type`= c("varchar","numeric"),
           `saw-sql:displayFormula` = c("Description","Number"))

如果有任何关于我在这里遗漏的指针,我将不胜感激!

【问题讨论】:

    标签: r xml


    【解决方案1】:
    get_stuff <- function(y, stuff) { unlist(lapply(y, function(x) x[[stuff]])) }
    
    xml_list <- xmlToList(file)[["schema"]][["complexType"]][["sequence"]]
    
    DF <- data.frame(name = get_stuff(xml_list, "name"),
                     type = get_stuff(xml_list, "type"),
                     minOccurs = get_stuff(xml_list, "minOccurs"),
                     maxOccurs = get_stuff(xml_list, "maxOccurs"),
                     saw_sql_type = get_stuff(xml_list, "type"),
                     saw_sql_displayFormula = get_stuff(xml_list, "displayFormula"))
    

    【讨论】:

      【解决方案2】:
      data.frame(do.call(rbind, xmlToList(file)$schema$complexType$sequence), row.names=NULL)
           name       type minOccurs maxOccurs  type.1 displayFormula
      1 Column0 xsd:string         0         1 varchar    Description
      2 Column1 xsd:string         0         1 numeric         Number
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-09-28
        • 2015-09-09
        • 1970-01-01
        • 1970-01-01
        • 2021-06-21
        相关资源
        最近更新 更多