【问题标题】:TypeError: '(slice(0, 201617, 1), slice(None, None, None))' is an invalid keyTypeError: '(slice(0, 201617, 1), slice(None, None, None))' 是一个无效的键
【发布时间】:2019-09-11 20:08:24
【问题描述】:

读取以下数据后:

Head:
          Open       Close        High         Low       Volume   volume_adi   volume_obv  volume_obvm  ...  momentum_stoch  momentum_stoch_signal  momentum_wr  momentum_ao  others_dr  others_dlr  others_cr   nextClose
0  118.940002  118.950996  119.015999  118.926003  3468.199951 -1468.002197     0.000000     0.000000  ...       27.777779              27.777779   -72.222221     0.000000  14.749734    0.000000   0.000000  118.948997
1  118.954002  118.959000  118.974998  118.892998  3083.300049  1139.846680  3083.300049    -8.533334  ...       53.658535              35.663956   -46.341465     0.000000   0.008407    0.008407   0.006725  118.975998
2  118.966003  118.975998  118.990997  118.922997  2914.600098  3508.808105  2914.600098   722.250000  ...       67.479675              48.897923   -32.520325     0.000000   0.014291    0.014290   0.021017  118.985001
3  118.992996  118.985001  119.000000  118.967003  3088.800049  1909.547119  3088.800049  1195.560059  ...       74.796745              65.311653   -25.203253     0.000000   0.007565    0.007564   0.028583  118.987999
4  118.987999  118.987999  119.001999  118.953003  3175.399902  1641.685669  3175.399902  1525.533325  ...       77.235771              73.170731   -22.764227    -0.001633   0.002521    0.002521   0.031105  118.984001

这样:

column_names = ['Open', 'Close', ... , 'others_cr', 'nextClose']
dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = '?', comment='\t', index_col=False,
                      sep=',', skipinitialspace=True, skiprows=[1], dtype='float32')

print('Head:\n {}'.format(dataset.head()))

尝试拆分数据并添加新维度时出现以下错误:

train_size = int(len(dataset) * 0.67)
train_dataset = dataset[0:train_size,:]

错误:

TypeError: '(slice(0, 201617, 1), slice(None, None, None))' is an invalid key

任何帮助将不胜感激,在此先感谢。

【问题讨论】:

  • dataset 是一个熊猫数据框。从数据框中选择行的正确方法是什么?
  • 感谢@hpaulj 你的笔记拯救了我的一天,我试图在 pandas.dataframe 上使用 numpy 分割技术

标签: pandas numpy tensorflow


【解决方案1】:

我试图在pandas.dataframe 上使用numpy 拆分技术 通过将dataframe 转换为numpy 数组来解决它:

dt = dataset.values
dt = dt.astype('float32')
train_size = int(len(dt) * 0.67)
train_dataset = dt[0:train_size,:]

【讨论】:

    【解决方案2】:

    您最好确保以随机方式拆分数据。

    import random
    import pandas as pd
    
    # Get the training data size
    train_size = int(dataset.shape[0] * 0.67)      # dataset.shape[0] is how many rows the dataset have
    
    # randomly choose the training data from dataset
    train_loc = random.sample(range(dataset.shape[0]), train_size)    # get the the rows' location
    train_dataset = dataset.loc[train.loc, :]      # get the traininig dataset
    test_dataset = dataset.drop(train_size, axis=0)    # get the remaining of dataset as the test dataset
    

    您也可以使用 scikit-learn 来拆分数据集

    sklearn.model_selection.train_test_split

    【讨论】:

      猜你喜欢
      • 2020-11-19
      • 1970-01-01
      • 2020-03-19
      • 2023-03-30
      • 1970-01-01
      • 2019-08-12
      • 2021-04-01
      • 2021-12-09
      • 1970-01-01
      相关资源
      最近更新 更多