【发布时间】:2019-01-31 09:09:46
【问题描述】:
我有以下 html 文档:
library(rvest)
sess <- html_session("http://www.sudacka-mreza.hr/sudska-praksa.aspx", encoding = "UTF-8")
form <- html_form(sess)[[1]]
fill_form <- set_values(form, 'uc_login1$LoginUserName' = 'mislav.sagovac@contentio.biz',
'uc_login1$LoginPassword' = 'theanswer')
sess_submit <- submit_form(sess, fill_form, submit = "uc_login1$LoginSubmitButton", encoding = "UTF-8")
praxis <- sess_submit %>%
jump_to( "odluke.aspx?Search=&Search2=&Court=112&Type=---&Type1=---&Type1a=---&Type2=---&Type2a=---&Type3=&Type4=&O1=&O2=&O3=&O4=&P1=&P2=&ShowID=21216"
, encoding = "UTF-8")
decision <- read_html(praxis, encoding = "UTF-8") %>%
html_nodes(xpath = "//*[@id='mainContent']")
我想将决定保存为 html。我尝试了几种解决方案(使用 write_html、read.table),但某些 UTF-8 字符未正确显示在 html 文件中。
尝试过的解决方案:
# first tried solutions
decision <- paste(as.character(decision), collapse = "\n")
write.table(decision,
file=paste0("some_path.html"),
quote = FALSE,
col.names = FALSE,
row.names = FALSE
# fileEncoding = "UTF-8"
)
# second tried solutions
writeLines(iconv(decision,
from = "CP1252", to = "UTF8"),
file(paste0("some_path.html"),
encoding="UTF-8"))
【问题讨论】:
-
您遇到错误了吗?您究竟是如何“检查”文件以查看字符是否正确。你在windows机器上吗?
Encoding(decision)在paste()之后返回什么? -
我猜你想要
writeLines(decision, "some_path.html", useBytes = TRUE),正如here所描述的那样。 -
我在本地保存后打开了文件。我已经查阅了上面的链接,但没有一个解决方案有效。我没有收到错误,只是某些字符不是 UTF-8。我在windows机器上。编码函数返回“UTF-8”
-
第二条评论中的行不起作用。
-
在重复的答案中,他们将文件保存为 tyt,而不是 html