【问题标题】:many to many sequence prediction variable length input/output inkeras多对多序列预测可变长度输入/输出inkeras
【发布时间】:2020-11-04 11:40:01
【问题描述】:

我试图使用 Keras 预测可变长度输入/输出多对多序列,下面的数据框是数据的表示。 5 列和 1 个目标列。

    df3={'email': [[0,0,0,1],[0,1,2],[0,3,1,5],[0,0,0,1],[0,1,2],[0,3,1,5]],
         'fax':[[0,1,0,1],[3,2],[0,2,1,5,4,6],[0,1,0,1],[3,2],[0,2,1,5,4,6]],
         'physical_mail':[[0,0,0,2],[0,2],[0,9,1,3,4,0],[0,0,3,0],[1,2],[0,2,0,2,4,6]],
         'cold_call':[[0,0,0,0,0,0],[0,2,0,0],[0,1,1,3,2,0,2,2,],[0,0,3,0,0,0,0],[1,2,5,0,0,1,2],[0,2,0,2,4,3,9,0,6]],
         'in_person':[[0,0,0,0,0,0],[0,0,0,0],[0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,1],[1,0,0,0,0,0,0],[0,2,0,2,0,0,9,0,0,0,0,1]],
          'tar':[[0,1],[0,0,0,0],[0,0,0,0,1],[0,1],[0,0,0,0],[0,0,0,0,1]]
         }
    df4=pd.Dataframe(df3)

为了重塑数据,有 6 个样本,5 列,一次输入一列 y 是 6 个样本,每次 1 列一列

    x_train=df4[['email','fax','physical_mail','cold_call','in_person']].values.reshape(6,5,1)
    y_train=df4.tar.values.reshape(6,1,1)


 
 model = Sequential()  
 ## 5 columns which are passed one at a time so the input shape (5,1)
 model.add(LSTM(64 , input_shape=(5,1))) 
 # kinda not sure about the RepeatVector argument 
 model.add(RepeatVector(10))
 model.add(LSTM(64,return_sequences=True))
 model.add(TimeDistributed(Dense(1)))
 model.add(Activation('linear'))   
 model.compile(loss='mean_squared_error', optimizer='rmsprop')

我看到一个错误“使用序列设置数组元素。是因为输入是列表的混合吗?如果是这样如何展平这个?

【问题讨论】:

  • 我试图理解为什么数据框中的每个变量都是可变长度列表的列表。
  • 我可以通过填充到最大长度来使它们统一,这会有所帮助
  • 是的,这有点必要。
  • pad_sequences(df4[['email','fax','physical_mail','cold_call','in_person']].values,maxlen=12) ,蹩脚的问题,但这是错误的,我必须使用 numpy 还是我在做一些傻事
  • 嗯,让我试试

标签: keras sequence-to-sequence


【解决方案1】:

试试这个 -

np.array([np.concatenate(pad_sequences(list(v), maxlen=12)) for k,v in df4[['email','fax','physical_mail','cold_call','in_person']].items()])
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 5, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 3, 1, 5],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        3, 2, 0, 0, 0, 0, 0, 0, 0, 2, 1, 5, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0,
        0, 2, 1, 5, 4, 6],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 2, 0, 0, 0, 0, 0, 0, 0, 9, 1, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0,
        0, 2, 0, 2, 4, 6],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2,
        0, 0, 0, 0, 0, 0, 0, 1, 1, 3, 2, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 3,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 5, 0, 0, 1, 2, 0, 0, 0, 0, 2, 0,
        2, 4, 3, 9, 0, 6],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0,
        9, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 1]]

这应该为您提供每一行的一维数组,其中每一列都被填充到 12 长度并连接起来。假设这是你需要的。如果每行都需要二维数组,则忽略连接部分。

np.array([pad_sequences(list(v), maxlen=12) for k,v in df4.items()])
array([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 5],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 5]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2],
        [0, 0, 0, 0, 0, 0, 0, 2, 1, 5, 4, 6],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2],
        [0, 0, 0, 0, 0, 0, 0, 2, 1, 5, 4, 6]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2],
        [0, 0, 0, 0, 0, 0, 0, 9, 1, 3, 4, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2],
        [0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 4, 6]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0],
        [0, 0, 0, 0, 0, 1, 1, 3, 2, 0, 2, 2],
        [0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 2, 5, 0, 0, 1, 2],
        [0, 0, 0, 0, 2, 0, 2, 4, 3, 9, 0, 6]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 2, 0, 2, 0, 0, 9, 0, 0, 0, 0, 1]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]]], dtype=int32)

【讨论】:

  • 再次谢谢你嗯,如果我使用 2D 数组,那么它会将我的输入更改为 shape(5,6,12) ,这如何将我的输入更改为 lstm ? 5 列,一次 1 列,但现在我有 12 列(每个序列的长度)? input_shape=(5,12) ?
  • 取决于您要编码的内容吗?您是否正在尝试对连接在一起的所有参数的完整序列进行编码?或者您正在尝试分别对 5 个输入序列中的每一个进行编码。
猜你喜欢
  • 2017-08-24
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-03-31
  • 1970-01-01
  • 2022-10-17
  • 2017-04-07
相关资源
最近更新 更多