R：pmatch 用于更困难的任务答案

【问题标题】：R: pmatch for a more difficult taskR：pmatch 用于更困难的任务
【发布时间】：2011-07-06 22:53:47
【问题描述】：

感谢@nullglob，

我尝试再次运行它，但我的输出不同。如果我滥用了你的代码，你介意教我吗？抱歉，我可能误解了它的工作方式。我希望你不介意给我更多的建议。

 df1 <- data.frame(
    A=c("x01","x02","y03","z02","x04", "x33", "z03"),
    B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz"))




 df2 <- data.frame(
    X=c("a","b","c","d","e", "f"),
    Y=c("A01BB","A02","C02A","B04","C01GX", "xxx"))





with(c(df1,df2),{
   i <- pmatch(Y,B)
   iunmatched <- which(is.na(i))
   nunmatched <- length(iunmatched)
   nexcess <- length(B) - length(X)
   data.frame(A = c(A,rep(NA,nunmatched)),
              B = c(B,rep(NA,nunmatched)),
              X = c(X[i],rep(NA,nexcess),X[iunmatched]),
              Y = c(Y[i],rep(NA,nexcess),Y[iunmatched]))  })

       A  B  X  Y
    1  1  1  1  1
    2  2  2  2  2
    3  5  5  3  5
    4  6  3  4  3
    5  3  4  5  4
    6  4  6 NA NA
    7  7  7 NA NA
    8 NA NA  6  6

=======================原始问题=====

感谢您回答我之前的问题。 (http://stackoverflow.com/q/6592214/602276)

为了建立这个答案，我想为更困难的任务做 pmatch。

df1 <- data.frame(
  A=c("x01","x02","y03","z02","x04", "x33", "z03")
  B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02", "yyy", "zzz")
)

    A       B
1 x01 A01BB01
2 x02 A02BB02
3 y03 C02AA05
4 z02 B04CC10
5 x04 C01GX02
6 x33     yyy
7 z03     zzz

我的df2修改如下：

df2 <- data.frame(
  X=c("a","b","c","d","e", "f"),
  Y=c("A01BB","A02","C02A","B04","C01GX", "xxx")
)

  X     Y
1 a A01BB
2 b   A02
3 c  C02A
4 d   B04
5 e C01GX
6 f   xxx

困难在于 df1 和 df2 的行数不同，我不能在右边开始 cbind

此外，df1 和 df2 之间存在一些不匹配，它们对应的行应该相应地导致 NA。

预期的输出如下：

   A       B   X     Y
1 x01 A01BB01   a A01BB
2 x02 A02BB02   b   A02
3 y03 C02AA05   c  C02A
4 z02 B04CC10   d   B04
5 x04 C01GX02   e C01GX
6 x33     yyy   NA  NA
7 z03     zzz   NA  NA
7 NA      NA    f   xxx

你能教我如何用 R 来做吗？非常感谢。

【问题讨论】：

@Andrie，感谢您编辑我的问题

标签： r match

【解决方案1】：

这并不完全是一个优雅的解决方案，但它似乎可以解决问题：

with(c(df1,df2),{
  i <- pmatch(Y,B)
  iunmatched <- which(is.na(i))
  nunmatched <- length(iunmatched)
  nexcess <- length(B) - length(X)
  data.frame(A = c(A,rep(NA,nunmatched)),
             B = c(B,rep(NA,nunmatched)),
             X = c(X[i],rep(NA,nexcess),X[iunmatched]),
             Y = c(Y[i],rep(NA,nexcess),Y[iunmatched]))
})

输出应该是：

     A       B    X     Y
1  x01 A01BB01    a A01BB
2  x02 A02BB02    b   A02
3  y03 C02AA05    c  C02A
4  z02 B04CC10    d   B04
5  x04 C01GX02    e C01GX
6  x33     yyy <NA>  <NA>
7  z03     zzz <NA>  <NA>
8 <NA>    <NA>    f   xxx

【讨论】：

A B X Y 1 1 1 1 1 2 2 2 2 2 3 5 5 3 5 4 6 3 4 3 5 3 4 5 4 6 4 6 NA NA 7 7 7 NA NA 8 NA NA 6 6 (我运行代码来得到这个，我需要运行额外的步骤来获取 df1 和 df2 中的原始数据吗？@nullglob