【发布时间】:2017-06-23 16:00:04
【问题描述】:
我正在尝试将 retrosheet boxscore 生成的 xml 文件转换为可以插入到 sql 表中的数据框。我大部分时间都在那里,但我不知道如何获取中间 xml 节点的属性。下面是一个示例,希望我正确粘贴了它。我想要获取的是 game_id、id(来自玩家)和完整的击球部分。
<boxscores>
<boxscore game_id="CHA191204110" date="1912/04/11" site="CHI10"
visitor="SLA" visitor_city="St.Louis" visitor_name="Browns" home="CHA"
home_city="Chicago" home_name="White Sox" start_time="0:00PM"
day_night="day" temperature="0" wind_direction="unknown" wind_speed="-1"
field_condition="unknown" precip="unknown" sky="unknown" time_of_game="110"
attendance="30000" umpire_hp="evanb901" umpire_1b="eganr101" umpire_2b=""
umpire_3b="" >
<linescore away_runs="2" away_hits="7" away_errors="1" home_runs="6"
home_hits="10" home_errors="1">
<inning_line_score away="0" home="0" inning="1"/>
<inning_line_score away="0" home="0" inning="2"/>
<inning_line_score away="0" home="1" inning="3"/>
<inning_line_score away="0" home="0" inning="4"/>
<inning_line_score away="2" home="0" inning="5"/>
<inning_line_score away="0" home="1" inning="6"/>
<inning_line_score away="0" home="1" inning="7"/>
<inning_line_score away="0" home="3" inning="8"/>
<inning_line_score away="0" home="x" inning="9"/>
</linescore>
<players team="SLA" lob="5" dp="0" tp="0" risp_ab="0" risp_h="0">
<player id="shotb101" lname="Shotton" fname="Burt" slot="1" seq="1" pos="8">
<batting ab="4" r="0" h="0" d="0" t="0" hr="0" bi="0" bi2out="-1" bb="0" ibb="-1" so="3" gdp="-1" hp="0" sh="0" sf="-1" sb="0" cs="-1" />
<fielding pos="8" outs="24" po="1" a="0" e="0" dp="0" tp="0" bip="-1" bf="-1" />
</player>
<player id="austj101" lname="Austin" fname="Jimmy" slot="2" seq="1" pos="5">
<batting ab="4" r="0" h="1" d="0" t="0" hr="0" bi="0" bi2out="-1" bb="0" ibb="-1" so="1" gdp="-1" hp="0" sh="0" sf="-1" sb="0" cs="-1" />
<fielding pos="5" outs="24" po="0" a="3" e="0" dp="0" tp="0" bip="-1" bf="-1" />
</player>
<player id="stovg101" lname="Stovall" fname="George" slot="3" seq="1" pos="3" >
<batting ab="4" r="0" h="1" d="0" t="0" hr="0" bi="0" bi2out="-1" bb="0" ibb="-1" so="0" gdp="-1" hp="0" sh="0" sf="-1" sb="0" cs="-1" />
<fielding pos="3" outs="24" po="11" a="0" e="0" dp="0" tp="0" bip="-1" bf="-1" />
</player>
</players>
</boxscore>
</boxscores>
这是我正在使用的代码
box <-
read_xml("Q:\\Sabermetrics\\Retrosheet\\download.folder\\unzipped\\1912.xml")
atbat <- xml_find_all(box, "//boxscore")
bind_rows(lapply(atbat, function(x) {
player <- try(xml_find_all(x, "./players/player/batting"), silent=FALSE)
if (inherits(player, "try-error") |
length(player) == 0) return(NULL)
bind_rows(lapply(player, function(y) {
data.frame(t(xml_attrs(y)), stringsAsFactors=FALSE)
})) -> player_dat
game_id <- try(xml_attr(x, "game_id"))
if (inherits(game_id, "try-error") |
length(game_id) == 0) return(NULL)
player_dat$game_id <- game_id
player_dat
})) -> player
我想以这样的方式结束
game_id player_id ab r h d ....
CHA191204110 shotb101 4 0 0 0 ....
CHA191204110 austj101 4 0 1 0 ....
CHA191204110 stovg101 4 0 0 0 ....
我尝试复制 game_id 代码并从玩家那里获取“id”,但它不起作用。我试过使用路径 ./players/player[@id] 和 ./players/player/@id 也不起作用。我试过只使用@id,还是不适用。
我不确定我做错了什么,我只是把东西往墙上扔,看看它是否会粘住......
【问题讨论】:
标签: r xml-parsing