【问题标题】:How to parse non xml as xml? [closed]如何将非xml解析为xml? [关闭]
【发布时间】:2017-04-20 02:11:55
【问题描述】:

http://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V

如何将其移植为 XML 文档?我正在尝试在 R 中解析它。

【问题讨论】:

  • 一个 XML 文档,结构正确,read_xml 读取它没有问题。你有什么尝试表明它不起作用?

标签: r xml


【解决方案1】:

可以使用xml2读取解析:

library(xml2)
library(tidyverse)

xml <- read_xml('https://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V')

bart <- xml %>% xml_find_all('//station') %>%    # select all station nodes
    map_df(as_list) %>%    # coerce each node to list, collect to data.frame
    unnest()    # unnest list columns of data.frame

bart
#> # A tibble: 46 × 9
#>                            name  abbr gtfs_latitude gtfs_longitude
#>                           <chr> <chr>         <chr>          <chr>
#> 1  12th St. Oakland City Center  12TH     37.803768    -122.271450
#> 2              16th St. Mission  16TH     37.765062    -122.419694
#> 3              19th St. Oakland  19TH     37.808350    -122.268602
#> 4              24th St. Mission  24TH     37.752470    -122.418143
#> 5                         Ashby  ASHB     37.852803    -122.270062
#> 6                   Balboa Park  BALB     37.721585    -122.447506
#> 7                      Bay Fair  BAYF     37.696924    -122.126514
#> 8                 Castro Valley  CAST     37.690746    -122.075602
#> 9         Civic Center/UN Plaza  CIVC     37.779732    -122.414123
#> 10                     Coliseum  COLS     37.753661    -122.196869
#> # ... with 36 more rows, and 5 more variables: address <chr>, city <chr>,
#> #   county <chr>, state <chr>, zipcode <chr>

【讨论】:

    【解决方案2】:

    使用库rvest。基本思想是使用 XPath 选择器查找感兴趣的节点 (xml_nodes),然后使用 xml_text 获取值

    library(rvest)
    
    doc <- read_xml("http://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V")
    names <- doc %>% 
      xml_nodes(xpath = "/root/stations/station/name") %>%
      xml_text()
    
    names[1:5]
    
    # [1] "12th St. Oakland City Center" "16th St. Mission"             "19th St. Oakland"             "24th St. Mission"            
    # [5] "Ashby"                       
    

    【讨论】:

      【解决方案3】:

      我在直接使用read_html 中的 URL 时遇到了一些问题。所以我首先使用了readLines。之后,它会找到所有带有&lt;station&gt; 的节点集。将其转换为列表并将其提供给data.table::rbindlist。使用rbindlist的想法来自here

      library(xml2)
      library(data.table)
      nodesets <- read_html(readLines("http://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V")) %>% 
          xml_find_all(".//station")
      data.table::rbindlist(as_list(nodesets))
      

      【讨论】:

        猜你喜欢
        • 2012-10-20
        • 1970-01-01
        • 2012-03-04
        • 2013-04-28
        • 2015-01-16
        • 1970-01-01
        • 1970-01-01
        • 2018-06-18
        • 2013-06-07
        相关资源
        最近更新 更多