将 htmltab 转换为 tibble 几乎不需要帮助答案

【问题标题】：Need little help converting a htmltab to a tibble将 htmltab 转换为 tibble 几乎不需要帮助
【发布时间】：2019-02-11 01:13:42
【问题描述】：

试图帮助一位朋友将迈阿密海豚队的足球赛程整理成一个小标题

library(htmltab)
library(tidyr)
library(tibble)

url <- "http://www.espn.com/nfl/team/schedule/_/name/mia"
data <- htmltab(doc = url, which = 1, header = 2)

unique(data)

as_tibble(data)

它提取相同的表头（变量）。我错过了一些东西。在将 htmltab 转换为 tibble 时需要一点帮助。谢谢。

What the table should look like

【问题讨论】：

标签： r html-table tidyr tibble

【解决方案1】：

所以我使用“rvest”包从网站获取数据。我认为主要问题是该网站没有提供可以直接使用的清晰表格格式。您必须清理它以获得所需的输出。

rm(list=ls())
library(tidyverse)
library(rvest)

##### get data from web #####
url = "http://www.espn.com/nfl/team/schedule/_/name/mia"
tb <- url %>%
  read_html() %>%
  html_table() # this function is actually going to read all tables at this url
rawdata = tb[[1]] # tb is a list and here we only want the fist table

#### clean up the data #####
names(rawdata) = rawdata[2,] # using the second row as data names
tmp = data[grepl("from",data$TICKETS),] # select rows that contain "from"
tmp2 = tmp[,!duplicated(names(tmp))] # delete columns that have duplicated column names
res = as_tibble(tmp2) # convert to tibble

清洁部分，我是通过观察数据一步一步来做的。当然，有很多方法可以执行相同的任务。

【讨论】：