【发布时间】:2019-09-01 08:29:51
【问题描述】:
>str(data$Installs)
$ 安装次数:因子 w/ 21 个级别 "","0+","1+","1,000+",..: 8 20 15 18 11 17 17 5 5 8 ...
db$Installs = as.character(gsub("\\+", "", db$Installs))
str(db$Installs)
chr [1:10841] "10,000" "500,000" "5,000,000" "50,000,000" "100,000" "50,000" "50,000" "1,000,000" "1,000,000" "10,000" ...
db$Installs = as.double(gsub(",","",db$Installs))
str(db$Installs)
num [1:10841] 1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04 ...
我想要这样的变量:
“10000”“500000”“5000000”“50000000”“100000”“50000”“50000”“1000000”“1000000”“10000”...
我试过这段代码
db$Installs.factor <- factor(db$Installs)
db$Installs = as.character(gsub("\\+", "", db$Installs))
db$Installs = as.double(gsub(",","",db$Installs))
【问题讨论】:
-
试试
as.numeric(gsub(",", "",db$Installs,fixed=TRUE))而不是double -
仍然显示相同
> str(db$Installs)chr [1:10841] "10,000" "500,000" "5,000,000" "50,000,000" "100,000" "50,000" "50,000" "1,000,000" "1,000"00000, " ...> db$Installs = as.numeric(gsub(",", "",db$Installs,fixed=TRUE))> str(db$Installs)num [1:10841] 1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04 ... 我想要这样的变量:“10000”“500000”“5000000”“50000000”“100000”“50000”“50000”“1000000”“1000000”“10000”... -
提供一些样本数据
-
对于这个
c <- c("10,000", "500,000" ,"5,000,000", "50,000,000" ,"100,000" ,"50,000" ,"50,000", "1,000,000" ,"1,000,000", "10,000"),上述解决方案有效。 -
你得到正确的输出(根据你的
str结果)1e+04是10000
标签: r database dataframe data-cleaning