【问题标题】:How to reshape series to use it in StandardScaler如何重塑系列以在 StandardScaler 中使用它
【发布时间】:2020-11-06 19:59:52
【问题描述】:

我正在尝试对我处理的一些数据执行时间序列 (LSTM),现在我正在尝试使用来自 sklearn 的 StandScaler 对其进行缩放。这是我对数据的初步预处理:

import json
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os
import pandas as pd
import sklearn
import torch
import torch.nn as nn
import torch.nn.functional as F

data = pd.read_csv('./data/bitstampUSD_1-min_data_2012-01-01_to_2020-04-22.csv')
data.isnull().values.any()

from datetime import datetime
data.dropna(subset = ["Weighted_Price"], inplace=True)
data.reset_index(drop=True, inplace=True) ##Too many null values so wanted to drop it and reindex


data['date'] = pd.to_datetime(data['Timestamp'],unit='s').dt.date
group = data.groupby('date')
daily_price = group['Weighted_Price'].mean()

daily_price.head()

df_train= daily_price[0:1800] #60%
df_train.shape
df_validation= daily_price[1801:2500] #80%
df_validation.shape
df_test= daily_price[2500:] #60%
df_test.shape

训练集、验证集和测试集的输出分别为 (1800,) (699,) 和 (533,)

当我尝试通过以下方式运行标准缩放器时:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
train_arr = scaler.fit_transform(df_train)
val_arr = scaler.transform(df_validation)
test_arr = scaler.transform(df_test)

我收到以下错误:

Expected 2D array, got 1D array instead:
array=[  4.47160287   4.80666667   5.         ... 759.70635334 751.50645584
 755.52545612].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample

我尝试通过以下方式重做之前的代码:

from sklearn.preprocessing import StandardScaler
df_train_2D = df_train.reshape(1,-1)
df_validation_2D = df_validation.reshape(1,-1)
df_test_2D = df_test.reshape(1,-1)
scaler = StandardScaler()
train_arr = scaler.fit_transform(df_train_2D)
val_arr = scaler.transform(df_validation_2D)
test_arr = scaler.transform(df_test_2D)

但我仍然收到错误:“系列”对象没有“重塑”属性

我不确定如何将其转换为可接受的缩放数组。 somoene 可以帮忙,让我知道如何解决这个问题吗?谢谢

【问题讨论】:

    标签: python scikit-learn


    【解决方案1】:

    DataFrame.reshape() 是 depreciated in pandas 0.19

    你需要重塑DataFrame或Series的values,可以通过添加.values得到:

    df.values.reshape(1,-1)
    

    或者,在你的情况下:

    df_validation_2D = df_validation.values.reshape(1,-1)
    

    【讨论】:

      猜你喜欢
      • 2013-01-01
      • 2017-10-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多