如何将数据与因子水平相关联？答案

【问题标题】：How to correlate data with factor levels?如何将数据与因子水平相关联？
【发布时间】：2020-02-02 03:43:39
【问题描述】：

我想将一些编码为因子的行为与连续协变量相关联。潜在的动机是动物在接近协变量（例如与食物的距离）时，将其行为从搜索（行为 1）更改为进食（行为 2）。

因此，当动物处于行为 1 时，协变量应该很大（到食物的距离很远），而当动物接近行为 2 和处于这种状态时（到食物的距离很短），协变量应该变小。一个皱纹是我有多种动物。

我的数据看起来像这样：

animalID behaviour 
1         1
1         1      
1         1
1         2
1         2
1         2
1         1
1         1
2         1
2         1
2         1
2         2
2         2
2         2
2         1

我想要这样的东西

animalID behaviour distance
1         1          100
1         1           99
1         1           98
1         2           58
1         2           57
1         2           60
1         1           74
1         1           75
2         1           104
2         1           101
2         1           100
2         2           40
2         2           44
2         2           42
2         1           86

【问题讨论】：

您如何选择这些值。不清楚，例如对于第二个animalID，距离从104开始
请详细解释距离一栏中的数字。
你真的有“距离”还是想编造一些东西？这也是一个时间序列，因此您会期望一些自相关？
它们代表到食物的距离。动物可以在每个点加速或减速，因此为什么距离值会突然跳跃
我没有距离，我想创建它只是为了测试。是的，会有一些自相关。

标签： r correlation

【解决方案1】：

鉴于您没有任何协变量，因此没有什么可做的。做某事的最简单方法就是使用移动平均线并根据需要进行变换

如果您确实有一些协变量要使用并且想要做一些更复杂的事情，那么您可以使用随机/蒙特卡罗方法。 Stan 语言可让您轻松定义贝叶斯模型并从中采样。在这种情况下，您可以定义一个简单的自回归模型：

data {
  int<lower=0> N;  // number of data points
  int<lower=0> animal[N];
  real behaviour[N];
}
parameters {
  real mu[N]; // the values you care about
  real<lower=0> sigma_auto;  // autocorrelation of values
  real<lower=0> sigma_behaviour;  // how close they should be to data
}
model {
  for (i in 2:N) {
    if (animal[i] == animal[i-1]) {
      // autoregressive component of model
      mu[i] ~ normal(mu[i-1], sigma_auto);
    }
  }
  // comparison to data
  behaviour ~ normal(mu, sigma_behaviour);
  // priors
  sigma_auto ~ cauchy(0, 0.05);
  sigma_behaviour ~ cauchy(0, 0.05);
}

代码有点像 R，但我建议阅读 manual。您可以通过以下方式运行它：

library(rstan)

df = read.table(text="animalID behaviour 
1         1
...
", header=TRUE)

fit <- stan("model.stan", iter=1000, data=list(
    N=nrow(df),
    animal=df$animalID,
    behaviour=df$behaviour
))

plot(df$behaviour)
mu <- extract(fit, 'mu')$mu
for (i in 1:nrow(mu)) {
    lines(mu[i,], lwd=0.2)
}

stan 调用编译模型（通过 C++ 编译器）并为iter 样本运行它。 extract 线将 mu 的样本从后部拉出，然后我将其绘制在数据上。

希望有帮助！

【讨论】：