【问题标题】:XPath to select grandparent and specific uncle nodeXPath 选择祖父节点和特定的叔节点
【发布时间】:2014-10-31 14:19:26
【问题描述】:

我在 R 中使用 XPath,并且有这样的 XML 结构:

library(XML)

xml1 <- xmlParse('
<L0>
    <L1>
        <ID>Get this ID</ID>
        <L1N1>Ignore node 1</L1N1>
        <L1N2>
            <L2>
                <L2N1>Get this node and all others in L2</L2N1>
            </L2>
        </L1N2>
        <L1N3>Ignore node 3</L1N3>
    </L1>
    <L1>
        <ID>Get this ID</ID>
        <L1N1>Ignore node 1</L1N1>
        <L1N2>
            <L2>
                <L2N1>Get this node and all others in L2</L2N1>
            </L2>
        </L1N2>
        <L1N4>Ignore node 4</L1N4>
    </L1>
    <L1>
        <ID>Ignore this ID</ID>
        <L1N1>Ignore node 1</L1N1>
        <L1N3>Ignore node 3</L1N3>
        <L1N4>Ignore node 4</L1N4>
    </L1>
</L0>
                 ')

我想提取每个L2 节点和一个叔节点(例如ID),而不是其他叔节点。每个提取的结果都应该返回到祖父节点L1。这是所需的输出

## [[1]]
## <L1>
##   <ID>Get this ID</ID>
##   <L1N2>
##     <L2>
##       <L2N1>Get this node and all others in L2</L2N1>
##     </L2>
##  </L1N2>
## </L1> 

## [[2]]
## <L1>
##   <ID>Get this ID</ID>
##   <L1N2>
##     <L2>
##       <L2N1>Get this node and all others in L2</L2N1>
##     </L2>
##   </L1N2>
## </L1>

我可以得到包含L2后代的L1节点:

getNodeSet(xml1, "//L1[descendant::L2]")
## [[1]]
## <L1>
##   <ID>Get this ID</ID>
##   <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
##   <L1N2>
##     <L2>
##       <L2N1>Get this node and all others in L2</L2N1>
##     </L2>
##   </L1N2>
##   <L1N3>Ignore node 3</L1N3> ## *Want to exclude this*
## </L1> 
## 
## [[2]]
## <L1>
##   <ID>Get this ID</ID>
##   <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
##   <L1N2>
##     <L2>
##       <L2N1>Get this node and all others in L2</L2N1>
##     </L2>
##   </L1N2>
##   <L1N4>Ignore node 4</L1N4> ## *Want to exclude this*
## </L1>

...但这包括我不想要的叔叔。我可以排除那些叔叔并选择我想要的L1 的子节点:

getNodeSet(xml1, "//L1/*[self::ID | child::L2]")
## [[1]]
## <ID>Get this ID</ID> 
##   
## [[2]]
## <L1N2>
##   <L2>
##     <L2N1>Get this node and all others in L2</L2N1>
##   </L2>
## </L1N2> 
## 
## [[3]]
## <ID>Get this ID</ID> 
##   
## [[4]]
## <L1N2>
##   <L2>
##     <L2N1>Get this node and all others in L2</L2N1>
##   </L2>
## </L1N2> 
## 
## [[5]]
## <ID>Ignore this ID</ID>

...但是现在IDL2 是分开的,而不是在L1 之下,并且它还包括来自第三个L1 节点的元素,它没有L2

XPath 能否返回所需的结果?如果没有,我可以在 R 中使用其他东西来实现结果吗?

【问题讨论】:

    标签: xml r xpath


    【解决方案1】:

    这似乎可以满足您的需求(使用您的 xml1):

    trim <- function(node) {
      names     <- names(node)
      to.remove <- names[!(names %in% c("ID","L1N2"))]
      removeChildren(node,kids=to.remove)
    }
    lapply(xml1["//L1[descendant::L2]"],trim)
    #  [[1]]
    # <L1>
    #   <ID>Get this ID</ID>
    #   <L1N2>
    #     <L2>
    #       <L2N1>Get this node and all others in L2</L2N1>
    #     </L2>
    #   </L1N2>
    # </L1> 
    # 
    # [[2]]
    # <L1>
    #   <ID>Get this ID</ID>
    #   <L1N2>
    #     <L2>
    #       <L2N1>Get this node and all others in L2</L2N1>
    #     </L2>
    #   </L1N2>
    # </L1> 
    

    当然你可以使用匿名函数并将其放在一行中:

    lapply(xml1["//L1[descendant::L2]"],function(node) removeChildren(node,kids=names(node)[!(names(node)%in%c("ID","L1N2"))]))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多