根据会话随机分配治疗虚拟变量答案

【问题标题】：Randomly assign Treatment dummy variables according to session根据会话随机分配治疗虚拟变量
【发布时间】：2017-12-19 09:50:05
【问题描述】：

我想分配一个名为“sender”的虚拟变量，但是，我只想在每个会话中随机分配，而不是整个实验。

假设我有 180 名学生的数据。每节课有18名学生。因此，我有 10 个会话。在每个会话中，应该有 9 个发送者（值为 1）和 9 个接收者（值为 0）。

到目前为止，我只做到了这一点，整个实验如下：

va <- c(1,0)
df$sender[sample(1:nrow(df1), nrow(df1), FALSE)] <- rep(va, 90,90)

我正在考虑以原始方式通过为每个会话重复应用相同的代码 10 次以上来执行此操作，但数据可能会变得更大。我将不胜感激！谢谢！

【问题讨论】：

标签： r dataframe dummy-variable

【解决方案1】：

不太确定您的预期输出是什么，但这应该可以帮助您朝着正确的方向前进：

students = 5 # per trial
trials = 3
senders = 2 # per trial

df = data.frame(studentID = seq(1,students*trials),session = 
                  rep(seq(trials), times = rep(students,trials)))

df$sender = unlist(sapply(seq(trials), function(x) 
        {as.numeric(seq(1,students) %in% sample(students,senders))}, simplify=F))

输出：

现在，我们有 3 个会话，每个会话有 5 名学生，每次试验正好有 2 个发件人。

   studentID session sender
1          1       1      0
2          2       1      0
3          3       1      1
4          4       1      1
5          5       1      0
6          6       2      1
7          7       2      0
8          8       2      1
9          9       2      0
10        10       2      0
11        11       3      1
12        12       3      1
13        13       3      0
14        14       3      0
15        15       3      0

希望这会有所帮助！

【讨论】：

只有一个来澄清这是我的预期输出！感谢您提供非常简洁的代码！
没问题，很高兴我能帮上忙！

【解决方案2】：

使用dplyr 的替代解决方案：

library(dplyr)

# create example dataset
session_id = 1:10
student_id = 1:18

df = expand.grid(student_id=student_id, session_id=session_id)

# fix randomisation (to replicate results)
set.seed(259)

df %>%
  sample_frac(1) %>%                                     # shuffle dataset (i.e. resample 100% of rows)
  group_by(session_id) %>%                               # for each session id
  mutate(sender = ifelse(row_number() <= 9 , 1, 0)) %>%  # flag as senders the first 9 (random) rows
  ungroup() %>%                                          # forget the grouping
  arrange(session_id, student_id)                        # arrange columns (not needed; only for visualisation purposes)


# # A tibble: 180 x 3
#   student_id session_id sender
#        <int>      <int>  <dbl>
# 1          1          1      0
# 2          2          1      0
# 3          3          1      1
# 4          4          1      1
# 5          5          1      0
# 6          6          1      1
# 7          7          1      0
# 8          8          1      1
# 9          9          1      0
# 10         10         1      1
# # ... with 170 more rows

【讨论】：