【问题标题】:For loop only running through the last iteration when creating new contingency table创建新列联表时,for循环仅运行最后一次迭代
【发布时间】:2019-08-16 06:36:22
【问题描述】:

我创建的 for 循环根据观察值计算预期值并将其存储在新的列联表中(我之前制作的副本)。 要计算预期,您将行总和和列总和相乘,然后除以总数。

我创建了一个嵌套在另一个 for 循环中的 for 循环,该循环遍历观察到的列联表并计算预期值,然后将其存储在新的预期表中,但是,在运行代码时,它只计算最后一次迭代或来自数据[3,3]。

The observed table w added margins:
              Frequently Never Rarely  Sum
  Conservative         15   214     47  276
  Liberal             119   479    173  771
  Other                85   172     45  302
  Sum                 219   865    265 1349
The expected table:

               Frequently Never Rarely
  Conservative         15   214     47
  Liberal             119   479    173
  Other                85   172     45

viewsandpot 是我命名的数据,我已经作为文件读取(所以它是一个表)。

expecteddata <- function(rawdata){
  observedtable <- table(factor(rawdata[,2]), factor(rawdata[,1]))
  observedtable <- addmargins(observedtable)
  expectedtable <- observedtable
  i <- 1
  j <- 1
  ncol <- ncol(observedtable)
  nrow <- nrow(observedtable)
  for(i in nrow-1){
    j <- 1
    for(j in ncol-1){
      expectedtable[i,j] <- (observedtable[i, ncol]*observedtable[nrow, j])/observedtable[ncol, nrow]
      j <- j+1
    }
  }
  return(expectedtable)
}
expecteddata(viewsandpot)

预期值列联表应该看起来像观察到的计数,但替换为计算值(数字应该不同)。

只有最后一次迭代有效 - 我从代码中得到的结果是:

             Frequently     Never    Rarely
  Conservative   15.00000 214.00000  47.00000
  Liberal       119.00000 479.00000 173.00000
  Other          85.00000 172.00000  59.32543

所以 59.325 是唯一不同的数字。

不知道为什么循环不起作用,考虑到内部 for 循环首先替换整个第一行,然后转到下一行。

【问题讨论】:

  • 附带说明,如果tab 是您的原始表,没有添加任何列或行总和,这只是chisq.test(tab)$expected
  • 您的 for 循环仅“循环”通过一个值,您需要像 for (i in 1:(nrow-1)) 这样的序列。我总是犯这个错误。
  • 我们如何获得您想要的数字输出?

标签: r loops for-loop datatable return


【解决方案1】:

我想,我终于明白了,希望这是您想要的解决方案:

Frequently <- c(15, 119, 85) #a vector 
Never <- c(214, 479, 172)
Rarely <- c(47, 173, 45)
#setting the observedtable to use later in the function as a data frame
data <- data.frame(Frequently, Never, Rarely, row.names = c("Conservative", "liberal", "other"))

expecteddata <- function(rawdata) {
  #make table to use with the dataframes first, second and third column
  observedtable <-matrix(data = c(rawdata[,1], rawdata[,2], rawdata[,3]), ncol=3)
  #make sum of rows and columns
  observedtable <- addmargins(observedtable)
  #make a dummy expectedtable with values from 1 to 9
  expectedtable <- matrix(1:9, ncol = 3)
  #sets the names of the columns and rows:
  colnames(expectedtable) <- c("Frequently", "Never", "Rarely")
  rownames(expectedtable) <- c("Conservative", "Liberal", "Other")

  ncol <- ncol(observedtable)
  nrow <- nrow(observedtable)            
  total <- observedtable[nrow, ncol]
  for (i in 1:(nrow - 1)) { #what you did was a for each loop of one item here its in the range of 1 to nrow-1 (range is always in r from:to)
    for (j in 1:(ncol - 1)) { #you dont have to set j for every outer loop =1 does it automatically
      rowSum <- observedtable[i, ncol]
      colSum <- observedtable[nrow, j]
      expectedtable[i, j] <- (rowSum * colSum) / total
    }
  }
  return(expectedtable)
}
print(expecteddata(data))

这是输出:

             Frequently    Never    Rarely
Conservative   44.80652 176.9755  54.21794
Liberal       125.16605 494.3773 151.45663
Other          49.02743 193.6471  59.32543

【讨论】:

  • 注意循环中的顺序——我认为应该是1:(nrow - 1)。例如。没有括号,1:3 - 1 给出0 1 2
  • @Marius 是的,你是对的,我打印了索引:0 1 2 3 我会更正它。虽然它以前也有效:D,但很奇怪......您更正的输出是相同的:)
【解决方案2】:
# Dummy data
Conservative = c(15, 214, 47)
Liberal = c(119, 479, 173)
Other = c(85, 172, 45)
df = data.frame(Conservative,Liberal,Other)
df = as.data.frame(t(df))
names = c("Frequently", "Never", "Rarely")
colnames(df) <- names

# sums 
df$row_sum = rowSums(df)
colsum = colSums(df)
df = rbind(df,colsum)
row.names(df) = c("Conservative", "Liberal", "Other", "colsum" )


# Create custom iterator index's
col_index = c(1,2,3)
col_index = rep(col_index,3) # rep 3 times 
row_index = c(1,2,3)
row_index = rep(row_index, each=3) # rep each number total of 3 times

# Loop to calculate the output (rowsum * colsum) / total 
out = as.data.frame(matrix(vector(mode = 'numeric',length = 9), nrow = 3, ncol = 3))  # initialize output
for (i in 1:length(row_index)) { # iterate the length of the custom iteration index vectors 
  out[row_index[i],col_index[i]] = (df[4,col_index[i]] * df[row_index[i],4]) / df[4,4]
} 

用于输出

> out
         V1       V2        V3
1  44.80652 176.9755  54.21794
2 125.16605 494.3773 151.45663
3  49.02743 193.6471  59.32543

【讨论】:

  • 它应该是 colsum * rowsum / total 因此例如对于第一个数字,它将是 219*276/1349
  • 看看现在是不是这样
  • outer(rowSums(df),colSums(df))/sum(df),如果使用rowSumscolSums
猜你喜欢
  • 2016-05-10
  • 1970-01-01
  • 2019-12-19
  • 2020-06-30
  • 1970-01-01
  • 2011-07-11
  • 2016-08-25
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多