【发布时间】:2015-08-26 10:59:02
【问题描述】:
我希望有人可以帮助我解决以下问题。我曾与开发人员一起为丹麦网页 www.jobindsats.dk 使用非文档 API。
过了一会儿,我决定换一种方式,只是抓取网页。
所以我真正想做的是
#open packages
library(XML)
library(xlsx)
#Define an unsexy URL
url<-'http://www.jobindsats.dk/jobindsats/sv/DatabankViewer/ShowResult?mGroupIds=mgrpY25I10_1%2CmgrpY25I10_2&AreaType=_nykom&AreaSort=population&AreaIds=146&FrequencyId=m&PeriodIds=2009M01%2C2009M02%2C2009M03%2C2009M04%2C2009M05%2C2009M06%2C2009M07%2C2009M08%2C2009M09%2C2009M10%2C2009M11%2C2009M12%2C2010M01%2C2010M02%2C2010M03%2C2010M04%2C2010M05%2C2010M06%2C2010M07%2C2010M08%2C2010M09%2C2010M10%2C2010M11%2C2010M12%2C2011M01%2C2011M02%2C2011M03%2C2011M04%2C2011M05%2C2011M06%2C2011M07%2C2011M08%2C2011M09%2C2011M10%2C2011M11%2C2011M12%2C2012M01%2C2012M02%2C2012M03%2C2012M04%2C2012M05%2C2012M06%2C2012M07%2C2012M08%2C2012M09%2C2012M10%2C2012M11%2C2012M12%2C2013M01%2C2013M02%2C2013M03%2C2013M04%2C2013M05%2C2013M06%2C2013M07%2C2013M08%2C2013M09%2C2013M10%2C2013M11%2C2013M12%2C2014M01%2C2014M02%2C2014M03%2C2014M04%2C2014M05%2C2014M06%2C2014M07%2C2014M08%2C2014M09%2C2014M10%2C2014M11%2C2014M12%2C2015M01%2C2015M02%2C2015M03%2C2015M04&_sektor=300&BenefitGroupId=Y25&MeasurementId=Y25I10&Name=&CubeId=star_y25i10&HasPivot=False&RowAxis=_omrade%2C_omrade_f3b%2C_sektor%2C_periode&ColumnAxis=MeasurementAxis#step3'
#Write a table
write.xlsx(readHTMLTable(url,header=T,stringsAsFactors=F, encoding="UTF-8"),file='Numbers.xlsx')
我收到以下错误:
data.frame 中的错误(
NULL= NULL,NULL= list(V1 = c("Hele landet", : 参数暗示不同的行数:0、76、1
原因是(据我所知)第一列仅包含一个值“Hele landet”(“整个国家”)。
【问题讨论】:
标签: xml r screen-scraping xlsx