具有 1 个隐藏层神经网络的 Tensorflow 中的预测不会改变 - 回归答案

【问题标题】：Predictions in Tensorflow with a 1-hiddden layer Neural Network does not change - regression具有 1 个隐藏层神经网络的 Tensorflow 中的预测不会改变 - 回归
【发布时间】：2017-09-19 18:00:06
【问题描述】：

总的来说，我是 TensorFlow 和神经网络的新手，我正在尝试开发一个可以预测属性值的神经网络（这是 Kaggle.com 上的入门竞赛之一），我知道使用神经网络可能不是解决回归问题的最佳模型，但我决定尝试一下。

当使用单层神经网络（没有隐藏层，这可能是线性回归）时，模型实际上预测的值接近实际值，但是当我添加隐藏层时，所有预测的值与批次相同20 个输入张量：

   ('real', array([[ 181000.],
       [ 128900.],
       [ 161500.],
       [ 180500.],
       [ 181000.],
       [ 183900.],
       [ 122000.],
       [ 378500.],
       [ 381000.],
       [ 144000.],
       [ 260000.],
       [ 185750.],
       [ 137000.],
       [ 177000.],
       [ 139000.],
       [ 137000.],
       [ 162000.],
       [ 197900.],
       [ 237000.],
       [  68400.]]))
('prediction ', array([[ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687],
       [ 4995.10597687]]))

更新： 我注意到预测值仅反映输出层的偏差，而隐藏层和输出层的权重都没有变化，并且始终为零

为了进一步检查出了什么问题，我生成了模型的图表（一次使用隐藏层，另一次没有使用隐藏层）来比较两个图表，看看是否缺少一些东西，不幸的是它们在我看来都是正确的，但我仍然不明白为什么模型在没有隐藏层时有效，而在使用隐藏层时无效

工作模型图（中间没有隐藏层）：

不工作模型的图表（带有隐藏层和输出层）

我的完整代码如下：

# coding: utf-8
import tensorflow as tf 
import numpy as np 
def loadDataFromCSV(fileName , numberOfFields , numberOfOutputFields , numberOfRecords):
    XsArray = np.ndarray([numberOfRecords ,(numberOfFields-numberOfOutputFields)] , dtype=np.float64)
    YsArray = np.ndarray([numberOfRecords ,numberOfOutputFields] , dtype=np.float64)
    fileQueue = tf.train.string_input_producer(fileName)
    defaultValues = [[0]]*numberOfFields
    decodedLine = [[None]]*numberOfFields
    reader  = tf.TextLineReader()
    key , singleLine = reader.read(fileQueue)
    decodedLine = tf.decode_csv(singleLine,record_defaults=defaultValues)
    inputFeatures = decodedLine[0:numberOfFields-numberOfOutputFields]
    outputFeatures =decodedLine[numberOfFields-numberOfOutputFields:numberOfFields]
    with tf.Session() as session : 
        tf.global_variables_initializer().run()
        coor = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coor)
        for i in range(numberOfRecords) :
            XsArray[i,:] ,YsArray[i,:]  = session.run([inputFeatures , outputFeatures]) 
        coor.request_stop()
        coor.join(threads)
    return XsArray , YsArray
x , y =loadDataFromCSV(['/Users/mousaalsulaimi/Downloads/convertcsv.csv'] , 289 , 1, 1460)
num_steps = 10000
batch_size = 20 

graph = tf.Graph()
with graph.as_default() :
    with tf.name_scope('input'):
        inputProperties  = tf.placeholder(tf.float32 , shape=(batch_size ,287 ))
    with tf.name_scope('realPropertyValue') :
        outputValues = tf.placeholder(tf.float32,shape=(batch_size,1))
    with tf.name_scope('weights'):
        hidden1_w  = tf.Variable( tf.truncated_normal([287,1000],stddev=math.sqrt(3/(287+1000)) , dtype=tf.float32))
    with tf.name_scope('baises'):
        hidden1_b = tf.Variable( tf.zeros([1000] , dtype=tf.float32) )
    with tf.name_scope('hidden_layer'):
        hidden1 =tf.matmul(inputProperties,hidden1_w) + hidden1_b
    #hidden1_relu = tf.nn.relu(hidden1)
    #hidden1_dropout = tf.nn.dropout(hidden1_relu,.5)
    with tf.name_scope('layer2_weights'):
        output_w  = tf.Variable(tf.truncated_normal([1000,1],stddev=math.sqrt(3/(1000+1)) , dtype=tf.float32))
    with tf.name_scope('layer2_baises'):
        output_b = tf.Variable(tf.zeros([1] , dtype=tf.float32))
    with tf.name_scope('layer_2_predictions'):
        output =tf.matmul(hidden1,output_w) + output_b
    with tf.name_scope('predictions'):
        predictedValues = (output)
    loss = tf.sqrt(tf.reduce_mean(tf.square(predictedValues-outputValues)))
    loss_l2 = tf.nn.l2_loss(hidden1_w)
    with tf.name_scope('minimization') :
        minimum = tf.train.AdamOptimizer(.5).minimize(loss+.004*loss_l2)

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print("Initialized")
    for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        offset = (step * batch_size) % (y.shape[0] - batch_size)
        # Generate a minibatch.
        batch_data = x[offset:(offset + batch_size), 1:]
        batch_labels = y[offset:(offset + batch_size), :]
        print("real" , batch_labels)
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {inputProperties : batch_data, outputValues : batch_labels}
        _, l, predictions  , inp  = session.run([minimum, loss, predictedValues  ,inputProperties ], feed_dict=feed_dict)
        print("prediction " , predictions)
        print("loss : " , l)
        print("----------")

        print('+++++++++++')

我还上传了数据文件 convertcsv.csv here 以供您查看。

感谢任何帮助找出我做错了什么。

谢谢你

【问题讨论】：

我认为这些都不是性能不佳的原因，但我注意到了 3 件事：首先，您使用 hidden1 而不是 hidden_dropout 来定义 output，所以您现在基本上只是在做线性回归，因为层之间没有激活函数。其次，您可能希望将output_w 的正则化添加到loss_l2。最后，32 位通常绰绰有余，因此明确使用 64 位浮点数可能没有什么区别。
你也可以尝试初始化权重。如果您使用 Xavier 初始化，则标准偏差应为 sqrt(3. / (in + out))。那是sqrt(3. / (287+1000)) 对应hidden1_w 和sqrt(3. / (1000+1)) 对应output_w。
谢谢 Styrke，我删除了 relu 激活函数和 dropout，因为我认为它们是导致问题的原因，我只是将它们退回，我也按照你的建议尝试了 Xavier 初始化，但是有没有变化，输出层仍然不能正确预测任何事情。

标签： python tensorflow neural-network artificial-intelligence

【解决方案1】：

好的，所以我终于知道问题出在哪里了，正如预期的那样，是神经网络中的权重，我还进行了一些预处理以增强预测：

import tensorflow as tf
import numpy as np
import math
from sklearn import preprocessing

def loadDataFromCSV(fileName , numberOfFields , numberOfOutputFields , numberOfRecords):
    XsArray = np.ndarray([numberOfRecords ,(numberOfFields-numberOfOutputFields)] , dtype=np.float64)
    YsArray = np.ndarray([numberOfRecords ,numberOfOutputFields] , dtype=np.float64)
    fileQueue = tf.train.string_input_producer(fileName)
    defaultValues = [[0]]*numberOfFields
    decodedLine = [[None]]*numberOfFields
    reader  = tf.TextLineReader()
    key , singleLine = reader.read(fileQueue)
    decodedLine = tf.decode_csv(singleLine,record_defaults=defaultValues)
    inputFeatures = decodedLine[0:numberOfFields-numberOfOutputFields]
    outputFeatures =decodedLine[numberOfFields-numberOfOutputFields:numberOfFields]
    with tf.Session() as session :
        tf.global_variables_initializer().run()
        coor = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coor)
        for i in range(numberOfRecords) :
            XsArray[i,:] ,YsArray[i,:]  = session.run([inputFeatures , outputFeatures])
        coor.request_stop()
        coor.join(threads)
    return XsArray , YsArray
x , y =loadDataFromCSV(['/Users/mousaalsulaimi/Downloads/convertcsv.csv'] , 289 , 1, 1460)
num_steps = 10000
batch_size = 20



graph = tf.Graph()
beta = .00009
with graph.as_default() : 
     keepprop = tf.placeholder( tf.float32 , shape=([1]) )
     with tf.name_scope('input'):
         inputProperties  = tf.placeholder(tf.float32 , shape=(None ,287 ))
     with tf.name_scope('realPropertyValue') : 
         outputValues = tf.placeholder(tf.float32,shape=(None,1))
     with tf.name_scope('weights'):
         hidden1_w  = tf.Variable( tf.truncated_normal([287,2000],stddev=math.sqrt(3/(1)) , dtype=tf.float32))
     with tf.name_scope('baises'):
         hidden1_b = tf.Variable( tf.zeros([2000] , dtype=tf.float32) )
     with tf.name_scope('hidden_layer'):
         hidden1 =tf.matmul(inputProperties,hidden1_w) + hidden1_b  
         hidden1_relu = tf.nn.relu(hidden1)
         hidden1_dropout = tf.nn.dropout(hidden1_relu,keep_prob=keepprop[0])
     with tf.name_scope('layer2_weights'):
         hidden2_w  = tf.Variable(tf.truncated_normal([2000,500],stddev=math.sqrt(3/(1)) , dtype=tf.float32))
     with tf.name_scope('layer2_baises'):
         hidden2_b = tf.Variable(tf.zeros([500] , dtype=tf.float32))
     with tf.name_scope('layer_2'):
         hidden2 =tf.matmul(hidden1_dropout,hidden2_w) + hidden2_b
         hidden2_relu = tf.nn.relu(hidden2)
     hidden2_dropout= tf.nn.dropout(hidden2_relu,keepprop[0])
     with tf.name_scope('output_layer_weights'): 
         output_w = tf.Variable(tf.truncated_normal([500,1],stddev=math.sqrt(3/(1)) , dtype=tf.float32))
     with tf.name_scope('outout_layer_baises'):
         output_b = tf.Variable(tf.zeros([1] , dtype=tf.float32))
     with tf.name_scope('output_layer'):
         output = tf.matmul(hidden2_dropout,output_w) + output_b    
     with tf.name_scope('predictions'):
         predictedValues = tf.nn.relu(output)
     loss = tf.sqrt(tf.reduce_mean(tf.square((predictedValues)-(outputValues))))
     loss_l2 = tf.nn.l2_loss(hidden1_w) + tf.nn.l2_loss(hidden2_w) + tf.nn.l2_loss(output_w) + tf.reduce_mean(output_w) + tf.reduce_mean(hidden2_w) + tf.reduce_mean(hidden1_w)
     global_step = tf.Variable(0,trainable=False)
     start_step = .5 
     learning_rate = tf.train.exponential_decay(start_step ,global_step , 100 , .94 , staircase=True)
     with tf.name_scope('minimization') : 
         minimum = tf.train.AdadeltaOptimizer(learning_rate).minimize(loss+beta*loss_l2 , global_step=global_step)

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    '''writer = tf.summary.FileWriter('/Users/mousaalsulaimi/Downloads/21' , graph=graph)'''
    num_steps = 1000
    batch_size = 730
    print("Initialized")
    for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        offset = (step * batch_size) % (y.shape[0] - batch_size)
            # Generate a minibatch.

        batch_data_ss = preprocessing.MinMaxScaler().fit(x[offset:(offset + batch_size), 1:])
        batch_data = batch_data_ss.transform(x[offset:(offset + batch_size), 1:])
        batch_labels = y[offset:(offset + batch_size), :]
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {keepprop:[.65], inputProperties : batch_data, outputValues : batch_labels }
        _, l, predictions  , inp , w_l   = session.run([minimum, loss, predictedValues  ,inputProperties , hidden1_w   ], feed_dict=feed_dict)
        print("loss2 : " , l )
        print("loss : " , accuricy((predictions) ,( batch_labels)) )

以下是预测的样本

('loss : ', 0.15377927727091956)
('loss2 : ', 29109.197)
('loss : ', 0.1523804301893735)
('loss2 : ', 29114.414)
('loss : ', 0.15479254974665729)
('loss2 : ', 30617.834)
('loss : ', 0.15270011182205656)
('loss2 : ', 29519.598)
('loss : ', 0.15641723449772593)
('loss2 : ', 29307.811)
('loss : ', 0.15460120852074882)
('loss2 : ', 27985.998)
('loss : ', 0.14993038617463786)
('loss2 : ', 28811.738)
('loss : ', 0.1549284462882819)
('loss2 : ', 29157.725)
('loss : ', 0.15402833737387819)
('loss2 : ', 27079.215)
('loss : ', 0.14974744509723023)
('loss2 : ', 26622.93)
('loss : ', 0.1419577502544874

预测并不完美，但它得到了一些结果，正如您所看到的那样，每处房产的价格相差 30,000 美元

【讨论】：