【问题标题】:Input from keyboard does not return correct Unicode character键盘输入未返回正确的 Unicode 字符
【发布时间】:2020-12-18 09:35:15
【问题描述】:

我的值在 UTF-8 CSV 文件中包含 Unicode 字符 U+0103 ă。这个和其他来自越南语的 UTF-8 字符在数据框中正确显示。

ID     Subject
1      Ngữ văn
2      Toán
3      Địa lí

但是,当我过滤数据框时,这是可行的:

df %>% filter(Subject == "Toán")

# A tibble: 1 x 2
 ID   Subject
<dbl> <chr>  
  1   Toán

但不是这个:

df %>% filter(Subject == "Ngữ văn")

# A tibble: 0 x 2
# ... with 2 variables: ID <dbl>, Subject <chr>

我比较了字符串"Ngữ văn"和手动指定的ă字符串:

> "Ngữ văn"
[1] "Ngữ van"
> paste("Ngữ v","\u0103", "n", sep = "")
[1] "Ngữ văn"
> paste("Ngữ v","\u0103", "n", sep = "") == "Ngữ văn"
[1] FALSE

为什么输入字母ă 会返回a,我该如何解决这个问题?

我的会话信息:

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

【问题讨论】:

    标签: r unicode utf-8


    【解决方案1】:

    一切正常

    library(dplyr)
    df %>%
       filter(Subject == "Ngữ văn" )
    #  ID Subject
    #1  1 Ngữ văn
    

    数据

    df <- structure(list(ID = 1:3, Subject = c("Ngữ văn", "Toán", "Địa lí"
    )), class = "data.frame", row.names = c(NA, -3L))
    

    【讨论】:

    • 我上面已经提到了:在控制台输入"Ngữ văn"给你"Ngữ van"
    • @rendiku 你能检查一下你的 sessionInfo 吗
    • 我在@akrun上方添加了我的sessionInfo
    • @rendiku 我有不同的设置,这可能是en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8的原因
    • @rendiku 你可以试试this链接
    猜你喜欢
    • 2013-03-08
    • 2021-09-25
    • 1970-01-01
    • 2012-08-08
    • 1970-01-01
    • 2012-05-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多