factor() 函数可用于将数字向量与一组标签相关联。例如:
x <- c(1,1,1,2,3,3,2,3,4,4)
theLabels <- c("India","Canada","United States","Mexico")
y <- factor(x,1:4,theLabels)
y
产生以下输出:
> y <- factor(x,1:4,theLabels)
> y
[1] India India India Canada United States
[6] United States Canada United States Mexico Mexico
级别:印度 加拿大 美国 墨西哥
为了证明此答案适用于 OP 第五次编辑中提供的数据:
r <-c("India","Australia","Brazil","Canada","Indonesia","NewZealand",
"Phillipines","Qatar","Singapore","southAfrica","SriLanka","Turkey","UAE","UnitedKingdom","UnitedStates")
zom<- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216))
zom$Country.Code <- factor(zom$Country.Code,
levels = c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216),
labels = r)
zom$Country.Code
...和输出:
> zom$Country.Code
[1] India Australia Brazil Canada Indonesia NewZealand Phillipines Qatar
[9] Singapore southAfrica SriLanka Turkey UAE UnitedKingdom UnitedStates
15 Levels: India Australia Brazil Canada Indonesia NewZealand Phillipines Qatar Singapore southAfrica SriLanka Turkey ... UnitedStates
注意:一旦将原始代码转换为因子,基础代码就会丢失,因为因子的副作用是因子级别变为从 1 到唯一标签数量的有序列表与因素有关。
factor() 方法的替代方法是创建国家名称和代码的查找表,并将其与原始数据合并。这种方法保留了Country.Code 的原始值。
为了说明,我们将创建一个包含来自 OP 的多行 Country.Code 的数据框,并通过 dplyr::inner_join() 将其与查找表合并。然后我们将生成Country.Name 和Country.Code 的交叉表来说明连接过程的准确性。
library(dplyr)
# first, build a data frame containg multiple rows with same country code
zom<- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216,
1,14,30,37,94,148,162,166,184,189,191,208,214,215,216,
1,14,30,37,94,148,162,166,184,189,191,208,214,215,216))
# second, create lookup table of codes and names, one row per country
countryNames <- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216),
Country.Name= c("India","Australia","Brazil","Canada","Indonesia","NewZealand",
"Phillipines","Qatar","Singapore","southAfrica","SriLanka","Turkey","UAE","UnitedKingdom","UnitedStates"),
stringsAsFactors=FALSE)
# use dplyr::inner_join() to join country names
mergedData <- zom %>% inner_join(countryNames)
table(mergedData$Country.Name,mergedData$Country.Code)
...和输出:
> table(mergedData$Country.Name,mergedData$Country.Code)
1 14 30 37 94 148 162 166 184 189 191 208 214 215 216
Australia 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0
Brazil 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0
Canada 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0
India 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Indonesia 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0
NewZealand 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0
Phillipines 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0
Qatar 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0
Singapore 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0
southAfrica 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0
SriLanka 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0
Turkey 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0
UAE 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0
UnitedKingdom 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0
UnitedStates 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
>