【发布时间】:2020-12-18 09:35:15
【问题描述】:
我的值在 UTF-8 CSV 文件中包含 Unicode 字符 U+0103 ă。这个和其他来自越南语的 UTF-8 字符在数据框中正确显示。
ID Subject
1 Ngữ văn
2 Toán
3 Địa lí
但是,当我过滤数据框时,这是可行的:
df %>% filter(Subject == "Toán")
# A tibble: 1 x 2
ID Subject
<dbl> <chr>
1 Toán
但不是这个:
df %>% filter(Subject == "Ngữ văn")
# A tibble: 0 x 2
# ... with 2 variables: ID <dbl>, Subject <chr>
我比较了字符串"Ngữ văn"和手动指定的ă字符串:
> "Ngữ văn"
[1] "Ngữ van"
> paste("Ngữ v","\u0103", "n", sep = "")
[1] "Ngữ văn"
> paste("Ngữ v","\u0103", "n", sep = "") == "Ngữ văn"
[1] FALSE
为什么输入字母ă 会返回a,我该如何解决这个问题?
我的会话信息:
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
【问题讨论】: