创建新列联表时，for循环仅运行最后一次迭代答案

【问题标题】：For loop only running through the last iteration when creating new contingency table创建新列联表时，for循环仅运行最后一次迭代
【发布时间】：2019-08-16 06:36:22
【问题描述】：

我创建的 for 循环根据观察值计算预期值并将其存储在新的列联表中（我之前制作的副本）。要计算预期，您将行总和和列总和相乘，然后除以总数。

我创建了一个嵌套在另一个 for 循环中的 for 循环，该循环遍历观察到的列联表并计算预期值，然后将其存储在新的预期表中，但是，在运行代码时，它只计算最后一次迭代或来自数据[3,3]。

The observed table w added margins:
              Frequently Never Rarely  Sum
  Conservative         15   214     47  276
  Liberal             119   479    173  771
  Other                85   172     45  302
  Sum                 219   865    265 1349

The expected table:

               Frequently Never Rarely
  Conservative         15   214     47
  Liberal             119   479    173
  Other                85   172     45

viewsandpot 是我命名的数据，我已经作为文件读取（所以它是一个表）。

expecteddata <- function(rawdata){
  observedtable <- table(factor(rawdata[,2]), factor(rawdata[,1]))
  observedtable <- addmargins(observedtable)
  expectedtable <- observedtable
  i <- 1
  j <- 1
  ncol <- ncol(observedtable)
  nrow <- nrow(observedtable)
  for(i in nrow-1){
    j <- 1
    for(j in ncol-1){
      expectedtable[i,j] <- (observedtable[i, ncol]*observedtable[nrow, j])/observedtable[ncol, nrow]
      j <- j+1
    }
  }
  return(expectedtable)
}
expecteddata(viewsandpot)

预期值列联表应该看起来像观察到的计数，但替换为计算值（数字应该不同）。

只有最后一次迭代有效 - 我从代码中得到的结果是：

             Frequently     Never    Rarely
  Conservative   15.00000 214.00000  47.00000
  Liberal       119.00000 479.00000 173.00000
  Other          85.00000 172.00000  59.32543

所以 59.325 是唯一不同的数字。

不知道为什么循环不起作用，考虑到内部 for 循环首先替换整个第一行，然后转到下一行。

【问题讨论】：

附带说明，如果tab 是您的原始表，没有添加任何列或行总和，这只是chisq.test(tab)$expected。
您的 for 循环仅“循环”通过一个值，您需要像 for (i in 1:(nrow-1)) 这样的序列。我总是犯这个错误。
我们如何获得您想要的数字输出？

标签： r loops for-loop datatable return

【解决方案1】：

我想，我终于明白了，希望这是您想要的解决方案：

Frequently <- c(15, 119, 85) #a vector 
Never <- c(214, 479, 172)
Rarely <- c(47, 173, 45)
#setting the observedtable to use later in the function as a data frame
data <- data.frame(Frequently, Never, Rarely, row.names = c("Conservative", "liberal", "other"))

expecteddata <- function(rawdata) {
  #make table to use with the dataframes first, second and third column
  observedtable <-matrix(data = c(rawdata[,1], rawdata[,2], rawdata[,3]), ncol=3)
  #make sum of rows and columns
  observedtable <- addmargins(observedtable)
  #make a dummy expectedtable with values from 1 to 9
  expectedtable <- matrix(1:9, ncol = 3)
  #sets the names of the columns and rows:
  colnames(expectedtable) <- c("Frequently", "Never", "Rarely")
  rownames(expectedtable) <- c("Conservative", "Liberal", "Other")

  ncol <- ncol(observedtable)
  nrow <- nrow(observedtable)            
  total <- observedtable[nrow, ncol]
  for (i in 1:(nrow - 1)) { #what you did was a for each loop of one item here its in the range of 1 to nrow-1 (range is always in r from:to)
    for (j in 1:(ncol - 1)) { #you dont have to set j for every outer loop =1 does it automatically
      rowSum <- observedtable[i, ncol]
      colSum <- observedtable[nrow, j]
      expectedtable[i, j] <- (rowSum * colSum) / total
    }
  }
  return(expectedtable)
}
print(expecteddata(data))

这是输出：

             Frequently    Never    Rarely
Conservative   44.80652 176.9755  54.21794
Liberal       125.16605 494.3773 151.45663
Other          49.02743 193.6471  59.32543

【讨论】：

注意循环中的顺序——我认为应该是1:(nrow - 1)。例如。没有括号，1:3 - 1 给出0 1 2。
@Marius 是的，你是对的，我打印了索引：0 1 2 3 我会更正它。虽然它以前也有效：D，但很奇怪......您更正的输出是相同的:)

【解决方案2】：

# Dummy data
Conservative = c(15, 214, 47)
Liberal = c(119, 479, 173)
Other = c(85, 172, 45)
df = data.frame(Conservative,Liberal,Other)
df = as.data.frame(t(df))
names = c("Frequently", "Never", "Rarely")
colnames(df) <- names

# sums 
df$row_sum = rowSums(df)
colsum = colSums(df)
df = rbind(df,colsum)
row.names(df) = c("Conservative", "Liberal", "Other", "colsum" )


# Create custom iterator index's
col_index = c(1,2,3)
col_index = rep(col_index,3) # rep 3 times 
row_index = c(1,2,3)
row_index = rep(row_index, each=3) # rep each number total of 3 times

# Loop to calculate the output (rowsum * colsum) / total 
out = as.data.frame(matrix(vector(mode = 'numeric',length = 9), nrow = 3, ncol = 3))  # initialize output
for (i in 1:length(row_index)) { # iterate the length of the custom iteration index vectors 
  out[row_index[i],col_index[i]] = (df[4,col_index[i]] * df[row_index[i],4]) / df[4,4]
}

用于输出

> out
         V1       V2        V3
1  44.80652 176.9755  54.21794
2 125.16605 494.3773 151.45663
3  49.02743 193.6471  59.32543

【讨论】：

它应该是 colsum * rowsum / total 因此例如对于第一个数字，它将是 219*276/1349
看看现在是不是这样
或outer(rowSums(df),colSums(df))/sum(df)，如果使用rowSums和colSums。