【问题标题】:Mapping column values映射列值
【发布时间】:2014-08-27 11:49:06
【问题描述】:

我想使用一些映射函数来转换给定列的值。示例:

df <- data.frame(A = 1:5, B = sample(1:20, 10))
df
   A  B
1  1 17
2  2  5
3  3  3
4  4 11
5  5 19
6  1 16
7  2  4
8  3  7
9  4  6
10 5  9

我的目标是将 A 列的所有元素映射如下:

1 -> "tt"
2 -> "ff"
3 -> "ss"
4 -> "fs"
5 -> "sf"

我写了以下内容:

mappingList <- c("tt", "ff", "ss", "fs", "sf")
df$A <- unlist(lapply(df$A, function(x){replace(x, x>0, mappingList[x])}))
df
  A  B
1  tt 17
2  ff  5
3  ss  3
4  fs 11
5  sf 19
6  tt 16
7  ff  4
8  ss  7
9  fs  6
10 sf  9

上面的代码运行良好。

现在让我们假设另一个数据框,其中 A 列不是由整数 1、2、3、4、5 组成,而是由任何其他“通用”项组成,比如:

df <- data.frame(A = paste("str",1:5,sep=""), B = sample(1:20, 10))

df <- data.frame(A = seq(5, 25, by=5), B = sample(1:20, 10))

问题:您将如何编写映射?

【问题讨论】:

    标签: r replace map dataframe transform


    【解决方案1】:

    试试:

    mappingList[df$A]
    #[1] "tt" "ff" "ss" "fs" "sf" "tt" "ff" "ss" "fs" "sf"
    

    对于其他两个数据集:

    df1 <-  data.frame(A = paste("str",1:5,sep=""), B = sample(1:20, 10))
    df2 <- data.frame(A = seq(5, 25, by=5), B = sample(1:20, 10))
    
    mappingList[as.numeric(df1$A)]
    #[1] "tt" "ff" "ss" "fs" "sf" "tt" "ff" "ss" "fs" "sf"
    
    mappingList[as.numeric(factor(df2$A))]
    #[1] "tt" "ff" "ss" "fs" "sf" "tt" "ff" "ss" "fs" "sf"
    

    【讨论】:

      【解决方案2】:

      你看factor了吗?

      df$A_2 <- factor(df$A, levels = 1:5, labels = c("tt", "ff", "ss", "fs", "sf"))
      df
      #    A  B A_2
      # 1  1 17  tt
      # 2  2  5  ff
      # 3  3  3  ss
      # 4  4 11  fs
      # 5  5 19  sf
      # 6  1 16  tt
      # 7  2  4  ff
      # 8  3  7  ss
      # 9  4  6  fs
      # 10 5  9  sf
      

      基本上,您的 levels 参数应该具有要匹配的原始值,而您的 labels 参数应该具有替换值。


      您还可以使用命名向量创建查找表。

      例子:

      df <- data.frame(A = paste("str",1:5,sep=""), B = sample(1:20, 10))
      
      NamedVec <- setNames(paste("str",1:5,sep=""), c("tt", "ff", "ss", "fs", "sf"))
      NamedVec
      #     tt     ff     ss     fs     sf
      # "str1" "str2" "str3" "str4" "str5" 
      NamedVec[df$A]
      #     tt     ff     ss     fs     sf     tt     ff     ss     fs     sf 
      # "str1" "str2" "str3" "str4" "str5" "str1" "str2" "str3" "str4" "str5" 
      names(NamedVec[df$A])
      #  [1] "tt" "ff" "ss" "fs" "sf" "tt" "ff" "ss" "fs" "sf"
      

      【讨论】:

      • 具体df$A &lt;- factor(df$A, levels=c("str1","str2","str3","str4","str5"), labels=c("tt", "ff", "ss", "fs", "sf"))df$A &lt;- factor(df$A, levels=c(5,10,15,20,25), labels=c("tt", "ff", "ss", "fs", "sf"))
      • 太好了,我喜欢使用命名向量的查找表的想法。非常感谢阿难!
      • @Riad,来自SO用户@PaulHiemstra的相关阅读:numbertheory.nl/2014/01/25/…
      猜你喜欢
      • 2021-09-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-09-21
      • 2019-07-12
      • 2019-01-02
      • 2018-05-16
      • 1970-01-01
      相关资源
      最近更新 更多