这是一个快速实验,可以消除任何剩余的混乱。
从统计上讲,NN 层的权重通常接近正态分布(但不一定),但即使在实践中尝试对完美正态分布进行采样时,也存在总是计算错误。
然后考虑以下实验:
DIM = 1_000_000 # set our dims for weights and input
x = np.ones((DIM,1)) # our input vector
#x = np.random.rand(DIM,1)*2-1.0 # or could also be a more realistic normalized input
probs = [1.0, 0.7, 0.5, 0.3] # define dropout probs
W = np.random.normal(size=(DIM,1)) # sample normally distributed weights
print("W-mean = ", W.mean()) # note the mean is not perfect --> sampling error!
# DO THE DRILL
h = defaultdict(list)
for i in range(1000):
for p in probs:
M = np.random.rand(DIM,1)
M = (M < p).astype(int)
Wp = W * M
a = np.dot(Wp.T, x)
h[str(p)].append(a)
for k,v in h.items():
print("For drop-out prob %r the average linear activation is %r (unscaled) and %r (scaled)" % (k, np.mean(v), np.mean(v)/float(k)))
样本输出:
x-mean = 1.0
W-mean = -0.001003985674840264
For drop-out prob '1.0' the average linear activation is -1003.985674840258 (unscaled) and -1003.985674840258 (scaled)
For drop-out prob '0.7' the average linear activation is -700.6128015029908 (unscaled) and -1000.8754307185584 (scaled)
For drop-out prob '0.5' the average linear activation is -512.1602655283492 (unscaled) and -1024.3205310566984 (scaled)
For drop-out prob '0.3' the average linear activation is -303.21194422742315 (unscaled) and -1010.7064807580772 (scaled)
请注意,由于统计上不完美的正态分布,未缩放的激活会减少。
您能发现W-mean 与平均线性激活均值之间的明显相关性吗?