【问题标题】:Weird characters in RR中的奇怪字符
【发布时间】:2019-11-20 08:53:48
【问题描述】:

我试图在 R 中加载 .csv。我得到了类似的东西

<f3>?<e9><U+00BC>?<e4><f3> . 

我已在全局选项中将默认文本编码设置为 UTF-8。 R 是否可以在导出时特别编码撇号?

df = read.csv("text.csv", encoding="UTF-8",header=TRUE, stringsAsFactors=FALSE)

####Original CSV (Open in Notepad++)####
I don?ó?é¼?äót want
Jes?ÇÖs in the Family
others that wasn?ó?é¼?äót resolved and told
Am really happy with the this ?ƒÿü,
new ?ó?é¼?ôunbreakable?ó?é¼?¥ 
on the freeway?Ǫ.

####Load in R####
I don?<f3>?<e9><U+00BC>?<e4><f3>t want
Jes?<c7><d6>s in the Family
others that wasn?<f3>?<e9><U+00BC>?<e4><f3>t resolved and told
Am really happy with the this ?<U+0083><ff><fc>
new ?<f3>?<e9><U+00BC>?<f4>unbreakable?<f3>?<e9><U+00BC>?<U+00A5> 
on the freeway?<U+01EA>.

####What I want####
Because I don't want
Jes's in the Family
others that wasn't resolved and told
Am really happy with the this ????
new 'unbreakable'
on the freeway….

谢谢。

【问题讨论】:

  • csv文件的编码是什么?
  • 你从哪里得出“我想要什么”部分的输出
  • 可能与stackoverflow.com/questions/4806823/… 重复请注意readr 包中推荐的guess_encoding 参数。可以帮助解决您的问题。底线是您需要找出文件的原始编码。
  • @JdM - 我在 Excel 中打开文件并将其保存为 csv (UTF-8)
  • @MichaelChirico 我想要的输出(将在替换例如 ?ó?é¼?äó 后导出数据)

标签: r utf-8 character-encoding


【解决方案1】:

你可以这样做:

这里的 x 是一个字符串中的给定数据,如下所示:

x <- "I don?ó?é¼?äót want Jes?ÇÖs in the Family others that wasn?ó?é¼?äót resolved and told Am really happy with the this ?ƒÿü, new ?ó?é¼?ôunbreakable? ?é¼?¥ on the freeway?Ǫ."

您可以将gsubiconv 结合起来,以获得几乎想要的结果。我不确定如何在您的输出中获得笑脸:

 gsub("\\?+","'",iconv(x, "latin1", "ASCII", sub=""))

输出:

[1] "I don't want
     Jes's in the Family
     others that wasn't resolved and told
     Am really happy with the this ',
     new 'unbreakable'on the freeway'."

【讨论】:

    【解决方案2】:

    您应该尝试从 utf-8 转换为 ascii:

    dt <- iconv(dt, 'utf-8', 'ascii', sub='')
    

    iconv 在‘tm’库下

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-10-03
      • 2010-09-05
      • 2020-12-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-12-13
      相关资源
      最近更新 更多