从 for 循环追加时的数据类型问题答案

【问题标题】：Data type issue while appending from a for loop从 for 循环追加时的数据类型问题
【发布时间】：2022-02-16 02:50:41
【问题描述】：

我正在使用这个 for 循环将数据集分成组。但是列表“y”正在转换为一个错误的数组。

def to_sequences(dataset, seq_size=1):
    x = []
    y = []

    for i in range(len(dataset)-seq_size):
       
        window = dataset[i:(i+seq_size), 0]
        x.append(window)
        window2 = dataset[(i+seq_size):i+seq_size+5, 0]
        y.append(window2)
        
    return np.array(x),np.array(y)

seq_size = 5 
trainX, trainY = to_sequences(train, seq_size)
print("Shape of training set: {}".format(trainX.shape))
print("Shape of training set: {}".format(trainY.shape))

这是我收到的错误消息

VisibleDeprecationWarning：不推荐从不规则的嵌套序列（它是具有不同长度或形状的列表或元组或 ndarray 的列表或元组）创建 ndarray。如果您打算这样做，则必须在创建 ndarray 时指定“dtype=object”。返回 np.array(x),np.array(y)

找不到为什么它适用于“x”而不适用于“y”的问题。有什么想法吗？

【问题讨论】：

为什么你提到“它适用于'x'而不是'y'。”？在我看来，y 应该是问题所在。您是否通过将dtype=object 添加到np.array(x) 声明来尝试建议的解决方案？
这能回答你的问题吗？ Debugging Numpy VisibleDeprecationWarning (ndarray from ragged nested sequences)
它像这样给出 X 的预期输出 --- array([[1.6417541e-04, 1.8490013e-04, 5.3410418e-05, 8.7562017e-05, 7.6301396e-05], [1.8490013e-04, 5.3410418e-05, 8.7562017e-05, 7.6301396e-05, 9.8595303e-04],
但是对于 y 它即使在转换类型之后也会给出这样的结果 ---array([array([[0.00098595], [0.00388295], [0.00851235], [0.00851235], [0.01531321], [0.01527738]], dtype=float32), array([[0.00388295], [0.00851235], [0.01531321], [0.01527738], [0.02505753]], dtype=float32),
我认为您应该将这些添加到您的问题中。那是相关部分，我最初认为您的问题是警告，但您的问题实际上是输出。

标签： python arrays pandas list numpy

【解决方案1】：

In [247]: dataset = np.arange(20)
In [248]: def to_sequences(dataset, seq_size=1):
     ...:     x = []
     ...:     y = []
     ...:     for i in range(len(dataset)-seq_size):
     ...:         window = dataset[i:(i+seq_size), 0]
     ...:         x.append(window)
     ...:         window2 = dataset[(i+seq_size):i+seq_size+5, 0]
     ...:         y.append(window2)
     ...:     return np.array(x),np.array(y)
     ...:

和一个示例运行：

In [250]: to_sequences(dataset[:,None], 5)
<ipython-input-248-176eb762993c>:9: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return np.array(x),np.array(y)
Out[250]: 
(array([[ 0,  1,  2,  3,  4],
        [ 1,  2,  3,  4,  5],
        [ 2,  3,  4,  5,  6],
        [ 3,  4,  5,  6,  7],
        [ 4,  5,  6,  7,  8],
        [ 5,  6,  7,  8,  9],
        [ 6,  7,  8,  9, 10],
        [ 7,  8,  9, 10, 11],
        [ 8,  9, 10, 11, 12],
        [ 9, 10, 11, 12, 13],
        [10, 11, 12, 13, 14],
        [11, 12, 13, 14, 15],
        [12, 13, 14, 15, 16],
        [13, 14, 15, 16, 17],
        [14, 15, 16, 17, 18]]),
 array([array([5, 6, 7, 8, 9]), array([ 6,  7,  8,  9, 10]),
        array([ 7,  8,  9, 10, 11]), array([ 8,  9, 10, 11, 12]),
        array([ 9, 10, 11, 12, 13]), array([10, 11, 12, 13, 14]),
        array([11, 12, 13, 14, 15]), array([12, 13, 14, 15, 16]),
        array([13, 14, 15, 16, 17]), array([14, 15, 16, 17, 18]),
        array([15, 16, 17, 18, 19]), array([16, 17, 18, 19]),
        array([17, 18, 19]), array([18, 19]), array([19])], dtype=object))

第一个数组是 (n,5) int dtype。第二个是object dtype，包含数组。大多数数组是 (5,)，但最后一个是 (4,),(3,),(2,),(1,)。

dataset[(i+seq_size):i+seq_size+5, 0] 正在切掉dataset 的末尾。 Python/numpy 允许这样做，但结果会被截断。

如果你想要一个 (n,5) 形状的数组，你将不得不重新考虑 y 切片。

切掉列表的末尾：

In [252]: [1,2,3,4,5][1:4]
Out[252]: [2, 3, 4]
In [253]: [1,2,3,4,5][3:6]
Out[253]: [4, 5]

【讨论】：