【问题标题】:Determine which tags within xpath string have a class确定 xpath 字符串中的哪些标签有一个类
【发布时间】:2021-03-12 12:27:12
【问题描述】:

我的数据:

  xpath <- "/html/body[@class=\"das\"]/div/div[@class=\"ddd\"]/div[@class=\"spasd\"]/div[@class=\"josner(m/doandel\"]/p[@class=\"ficp\"]/a/strong[@class=\"asd\"]"

问题:

如何确定我的字符串 xpath 中的哪些标签有一个类?

预期输出:

c(FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE)

我尝试了什么:

  # Try1: Split on / fails if / is present within class name
  xpath %>% strsplit(split = "/") %>% .[[1]]
  
  # Try2:
  xp <- (strsplit(xpath, split = "")[[1]][-1]) %>% paste(collapse = "")
  rr <- strsplit(xp, split = "\\[(?:.*?)\\]/")[[1]] %>% stringr::str_count(pattern = "/")
  has_class <- lapply(rr, function(r) rep(!r, r + 1)) %>% unlist

编辑:

现在我想:如果我有相关文档,我可以解析目标标签并“上树”并检查类。

【问题讨论】:

    标签: r xpath


    【解决方案1】:

    自我回答:

    如果我有相关文档,我可以解析目标标签并“上树”并检查类。

      classes <- c()
      node <- html_nodes(doc, xpath = xpath)[1]
      name <- "x"
      while(name != "html"){
        classes <- c(classes, node %>% html_attr(name = "class"))
        node %<>% html_nodes(xpath = "..")
        name <- html_name(node)
      }
      classes <- c(classes, node %>% html_attr(name = "class")) %>% rev
      has_classes <- classes %>% is.na %>% magrittr::not()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2016-05-12
      • 2013-11-20
      • 1970-01-01
      • 1970-01-01
      • 2015-08-03
      • 2019-06-21
      相关资源
      最近更新 更多