【问题标题】:readHTMLtable-error: arguments imply differing number of rows:readHTMLtable-error: 参数暗示不同的行数:
【发布时间】:2015-08-26 10:59:02
【问题描述】:

我希望有人可以帮助我解决以下问题。我曾与开发人员一起为丹麦网页 www.jobindsats.dk 使用非文档 API。

过了一会儿,我决定换一种方式,只是抓取网页。

所以我真正想做的是

#open packages
library(XML)
library(xlsx)

#Define an unsexy URL
url<-'http://www.jobindsats.dk/jobindsats/sv/DatabankViewer/ShowResult?mGroupIds=mgrpY25I10_1%2CmgrpY25I10_2&AreaType=_nykom&AreaSort=population&AreaIds=146&FrequencyId=m&PeriodIds=2009M01%2C2009M02%2C2009M03%2C2009M04%2C2009M05%2C2009M06%2C2009M07%2C2009M08%2C2009M09%2C2009M10%2C2009M11%2C2009M12%2C2010M01%2C2010M02%2C2010M03%2C2010M04%2C2010M05%2C2010M06%2C2010M07%2C2010M08%2C2010M09%2C2010M10%2C2010M11%2C2010M12%2C2011M01%2C2011M02%2C2011M03%2C2011M04%2C2011M05%2C2011M06%2C2011M07%2C2011M08%2C2011M09%2C2011M10%2C2011M11%2C2011M12%2C2012M01%2C2012M02%2C2012M03%2C2012M04%2C2012M05%2C2012M06%2C2012M07%2C2012M08%2C2012M09%2C2012M10%2C2012M11%2C2012M12%2C2013M01%2C2013M02%2C2013M03%2C2013M04%2C2013M05%2C2013M06%2C2013M07%2C2013M08%2C2013M09%2C2013M10%2C2013M11%2C2013M12%2C2014M01%2C2014M02%2C2014M03%2C2014M04%2C2014M05%2C2014M06%2C2014M07%2C2014M08%2C2014M09%2C2014M10%2C2014M11%2C2014M12%2C2015M01%2C2015M02%2C2015M03%2C2015M04&_sektor=300&BenefitGroupId=Y25&MeasurementId=Y25I10&Name=&CubeId=star_y25i10&HasPivot=False&RowAxis=_omrade%2C_omrade_f3b%2C_sektor%2C_periode&ColumnAxis=MeasurementAxis#step3'

#Write a table
write.xlsx(readHTMLTable(url,header=T,stringsAsFactors=F, encoding="UTF-8"),file='Numbers.xlsx')

我收到以下错误:

data.frame 中的错误(NULL = NULL, NULL = list(V1 = c("Hele landet", : 参数暗示不同的行数:0、76、1

原因是(据我所知)第一列仅包含一个值“Hele landet”(“整个国家”)。

【问题讨论】:

    标签: xml r screen-scraping xlsx


    【解决方案1】:

    为什么不直接使用页面上方便的“Åbn i Excel”链接?

    library(httr)
    
    url <- "http://www.jobindsats.dk/jobindsats/sv/DatabankViewer/ExportToExcel?BenefitGroupId=Y25&MeasurementId=Y25I10&AreaType=_nykom&FrequencyId=m&CubeId=star_y25i10&AreaSort=population&HasPivot=False&MGroupIds=mgrpY25I10_1%2CmgrpY25I10_2&AreaIds=146&PeriodIds=2009M01%2C2009M02%2C2009M03%2C2009M04%2C2009M05%2C2009M06%2C2009M07%2C2009M08%2C2009M09%2C2009M10%2C2009M11%2C2009M12%2C2010M01%2C2010M02%2C2010M03%2C2010M04%2C2010M05%2C2010M06%2C2010M07%2C2010M08%2C2010M09%2C2010M10%2C2010M11%2C2010M12%2C2011M01%2C2011M02%2C2011M03%2C2011M04%2C2011M05%2C2011M06%2C2011M07%2C2011M08%2C2011M09%2C2011M10%2C2011M11%2C2011M12%2C2012M01%2C2012M02%2C2012M03%2C2012M04%2C2012M05%2C2012M06%2C2012M07%2C2012M08%2C2012M09%2C2012M10%2C2012M11%2C2012M12%2C2013M01%2C2013M02%2C2013M03%2C2013M04%2C2013M05%2C2013M06%2C2013M07%2C2013M08%2C2013M09%2C2013M10%2C2013M11%2C2013M12%2C2014M01%2C2014M02%2C2014M03%2C2014M04%2C2014M05%2C2014M06%2C2014M07%2C2014M08%2C2014M09%2C2014M10%2C2014M11%2C2014M12%2C2015M01%2C2015M02%2C2015M03%2C2015M04&_sektor=300&RowAxis=_omrade%2C_periode&ColumnAxis=MeasurementAxis&Name="
    
    GET(url, write_disk("data.xlsx", overwrite=TRUE), progress(), verbose())
    

    【讨论】:

    • 哇。惊人的!谢谢。它给出了很多警告?这是我应该担心的事情吗?它给出了这个错误 23 次。 1:在 curl::curl_fetch_disk(url, x$path, handle = handle) 中:进度回调必须返回布尔值
    • 尝试升级软件包,您可以在生产中删除详细信息
    猜你喜欢
    • 2019-02-13
    • 2015-12-08
    • 2021-01-01
    • 2018-07-22
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多