通过将函数应用于 r 中所有可能的变量组合来生成矩阵答案

【问题标题】：generating matrix by applying function to all possible combination of variables in r通过将函数应用于 r 中所有可能的变量组合来生成矩阵
【发布时间】：2014-04-29 03:32:48
【问题描述】：

这是我的小数据集，这是一个函数：

dat <- data.frame (
 A1 = c("AA", "AA", "AA", "AA"),
 B1 = c("BB", "BB", "AB", "AB"), 
 C1 = c("AB", "BB", "AA", "AB"))

功能

syfun <- function (x, y){

if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
        sxy = 1
}
if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
    sxy = 0.5
}
if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
    sxy = 0
}
return(sxy)
}

out <- rep (NA, NROW(dat))

for (i in 1:NROW(dat)){
out[i] <- syfun (dat[i,1], dat[i,1])
}

mean(out)
1

在这里我要做的是应用具有相同变量（变量 A1）的第一列（变量 A）的函数并平均输出值。我想将此输出保存到矩阵的单元格中。

A1 和 B1 之间也是如此。

   for (i in 1:NROW(dat)){
    out[i] <- syfun (dat[i,1], dat[i,2])
    }
    mean(out)
    0.25

现在类似于相关矩阵，我想保存变量之间所有可能的组合来制作一个矩阵。

         A1    B1    C1
A1       1.0  0.25  0.5
B1       0.25  1.0  NA
C1       0.5   NA   1.0

编辑：不产生 NA 的更完整的功能

syfun <- function (x, y){
  sxy <- NA
  if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
        sxy = 1
  }
  if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
        sxy = 0.5
  }
  if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
        sxy = 0
  }
  if (x == "BB" & y == "AB"| x == "AB" & y == "BB"){
        sxy = 0.5
  }

  if(x == "AB" & y ==  "AB") {
    sxy = 0.5
    }
  return(sxy)
}

【问题讨论】：

标签： r function variables matrix apply

【解决方案1】：

从您的示例中，您似乎只想知道一个中的 As 与另一个中的 As 的比例，以计算它们的相似性。如果是这样的话：（我假设这些是基因？）

dat <- data.frame (
 A1 = c("AA", "AA", "AA", "AA"),
 B1 = c("BB", "BB", "AB", "AB"), 
 C1 = c("AB", "BB", "AA", "AB"))

## this function takes the columns from dat,  pastes all the genes together, then counts the number of each that appears. It then divides the smaller by the larger to give you a percent similar (only does it for 'A' right now, but I could expand that to more genes if necessary)

fun <-  function(x,y){
  x.prop <- table(unlist(strsplit(Reduce(paste0, x),'*')))
  y.prop <- table(unlist(strsplit(Reduce(paste0, y),'*')))
  ans <- ifelse(x.prop['A']>y.prop['A'], y.prop['A']/x.prop['A'], x.prop['A']/y.prop['A'])
  return(ans)
}

final_mat <- matrix(ncol=3,nrow=3) ## creates an empty final matrix
colnames(final_mat) <- colnames(dat)  
rownames(final_mat) <- colnames(dat)


### this applies 'fun' to each of the 2 combinations of column names
final_mat[upper.tri(final_mat)] <- apply(combn(colnames(dat),2),2,function(x) fun(dat[,x[1]], dat[,x[2]]))

final_mat[lower.tri(final_mat)] <- apply(combn(colnames(dat),2),2,function(x) fun(dat[,x[1]], dat[,x[2]]))

diag(final_mat) <- 1

final_mat
     A1   B1  C1
A1 1.00 0.25 0.5
B1 0.25 1.00 0.5
C1 0.50 0.50 1.0

【讨论】：

【解决方案2】：

首先，如果没有匹配项，您的函数 syfun 必须返回 NA。因此，我在函数顶部添加了一行：

syfun <- function (x, y){
  sxy <- NA
  if(x == "AA" & y == "AA" | x == "BB" & y == "AA"){
        sxy = 1
  }
  if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
        sxy = 0.5
  }
  if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
        sxy = 0
  }
  return(sxy)
}

其次，您可以使用outer 将该功能应用于所有组合。你需要使用Vectorize来对函数进行向量化：

mat <- outer(names(dat), names(dat), function(x, y) 
  Vectorize(function(a, b) mean(Vectorize(syfun)(dat[[a]], dat[[b]])))(x,y))

第三，将对角线上的元素替换为1：

diag(mat) <- 1

四、设置行列名：

dimnames(mat) <- list(names(dat), names(dat))

结果：

     A1   B1  C1
A1 1.00 0.25 0.5
B1 0.25 1.00  NA
C1 0.50   NA 1.0

【讨论】：