【问题标题】:How to create a factor column based on another column?如何基于另一列创建因子列?
【发布时间】:2018-03-08 20:17:51
【问题描述】:

我想根据“社区区域”的值创建一个名为“区域”的列,例如社区区域 1 = 北,社区区域 2 = 南。 我希望它是这样的:

Community area   Region
25               West
67               Southwest
39               South
40               South
25               West

我尝试了以下代码,但没有帮助:

region<-function(x){if(x==c(8,32,33)){crime$Region<-"Central"} 
else if(x==c(5,6,7,21,22)){crime$Region<-"North"}
else if(x==c(1:4,9:14,76,77)){crime$Region<-"Far North Side"}
else if(x==c(15:20)){crime$Region<-"Northwest Side"}
else if(x==c(23:31)){crime$Region<-"West"}
else if(x==c(34:43,60,69)){crime$Region<-"South"}
else if(x==c(56:59,61:68)){crime$Region<-"Southwest Side"}
else if(x==c(44:55)){crime$Region<-"Far Southeast Side"}
else if(x==c(70:75)){crime$Region<-"Far Southwest Side"}
else {crime$Region<-"Other"}
}
region(crime$Community.Area)

【问题讨论】:

  • 寻求帮助时,您应该包含一个简单的reproducible example,其中包含可用于测试和验证可能解决方案的示例输入和所需输出。你运行的代码到底有什么问题?
  • dplyr 包中尝试case_when

标签: r


【解决方案1】:

可以通过修改region 函数来实现OP 思路中的一个解决方案。

  # Take one value at a time and return Region
  region<-function(x){if(x %in% c(8,32,33)){"Central"} 
    else if(x %in% c(5,6,7,21,22)){"North"}
    else if(x %in% c(1:4,9:14,76,77)){"Far North Side"}
    else if(x %in% c(15:20)){"Northwest Side"}
    else if(x %in% c(23:31)){"West"}
    else if(x %in% c(34:43,60,69)){"South"}
    else if(x %in% c(56:59,61:68)){"Southwest Side"}
    else if(x %in% c(44:55)){"Far Southeast Side"}
    else if(x %in% c(70:75)){"Far Southwest Side"}
    else {"Other"}
  }

# Use mapply to pass each value of `Community_area` to find region as
df$Region <- mapply(region, df$Community_area)

df
#  Community_Area         Region
#1             25           West
#2             67 Southwest Side
#3             39          South
#4             40          South
#5             25           West

数据

df <- data.frame(Community_Area = c(25, 67, 39, 40, 25))

【讨论】:

    【解决方案2】:

    对于涉及ifelse if 的长表达式,请尝试case_when 包中的case_when

    > set.seed(1234)
    > 
    > df <- data.frame(x1 = round(runif(n = 20, min = 1, max = 4), 0), stringsAsFactors = F)
    > 
    > df
       x1
    1   1
    2   3
    3   3
    4   3
    5   4
    6   3
    7   1
    8   2
    ...
    20  2
    > 
    > df$Region <- dplyr::case_when(df$x1 == 1 ~ "North", 
    +                  df$x1 == 2 ~ "South", 
    +                  df$x1 == 3 ~ "East",
    +                  TRUE ~ "West")
    > df
       x1 Region
    1   1  North
    2   3   East
    3   3   East
    4   3   East
    5   4   West
    6   3   East
    7   1  North
    ...
    20  2  South
    

    【讨论】:

    • case_when 似乎不那么耗时,但我收到了 16 个警告。结果列有许多 NA 值。警告:1: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length
    • 我不确定e1 对象是什么。也许下次您可以向我们展示您正在努力实现的目标的实际示例。您似乎试图对不匹配的数据执行case_when(在这种情况下,长度不同)。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-12-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多