【问题标题】:K fold cross validation---KeyError: '[] not in index' [closed]K折交叉验证---KeyError:'[]不在索引中'[关闭]
【发布时间】:2021-11-30 09:49:26
【问题描述】:

我在应用 k 折时遇到问题。请有人帮我做这件事。当我应用 train_test_split 时,它不会产生问题,但 k-fold 会在索引方面产生问题。

如何在我的数据集中应用 k 折叠?

我的代码是这样的

from sklearn.model_selection import KFold
df = pd.read_csv('CD.TXT',delimiter=',')
df.head() 
X = df[['A', 'B', 'C', 'D']].values
Y=df['Label'].values
X=pd.DataFrame(X)
Y=pd.DataFrame(Y)
cv = KFold(n_splits=10, random_state=42, shuffle=False)
for train_index, test_index in cv.split(X):
    print("Train Index: ", train_index, "\n")
    print("Test Index: ", test_index)
X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]
print(X_train)
print(Y_train)

我的数据集是这样的

A,B,C,D,Label
10,20,30,40,1
20,20,15,60,0
10,20,30,40,1
10,20,30,40,1
10,20,39,40,1
10,20,30,40,1
10,20,30,40,1
10,20,32,40,1
10,20,30,40,1
10,20,30,40,1
10,20,3,40,1
20,20,15,60,0
20,20,15,60,0
20,20,12,60,0
20,20,15,60,0
20,20,15,60,0
20,20,12,60,0
20,20,15,60,0

我面临的错误

Test Index:  [18]
Traceback (most recent call last):

  File "<ipython-input-11-10016b897261>", line 1, in <module>
    runfile('D:/experiments/untitled0.py', wdir='D:/experiments')

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/experiments/untitled0.py", line 61, in <module>
    X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2934, in __getitem__
    raise_missing=True)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1354, in _convert_to_indexer
    return self._get_listlike_indexer(obj, axis, **kwargs)[1]

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1252, in _validate_read_indexer
    raise KeyError("{} not in index".format(not_found))

KeyError: '[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17] not in index'

【问题讨论】:

    标签: python dataframe machine-learning scikit-learn k-fold


    【解决方案1】:

    错误的原因是您尝试使用 numpy 索引来索引数据帧。

    尝试评论 X=pd.DataFrame(X) Y=pd.DataFrame(Y)

    from sklearn.model_selection import KFold
    df = pd.read_csv('CD.TXT',delimiter=',')
    df.head() 
    X = df[['A', 'B', 'C', 'D']].values
    Y=df['Label'].values
    #X=pd.DataFrame(X)
    #Y=pd.DataFrame(Y)
    cv = KFold(n_splits=10, random_state=42, shuffle=False)
    for train_index, test_index in cv.split(X):
        print("Train Index: ", train_index, "\n")
        print("Test Index: ", test_index)
    X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]
    print(X_train)
    print(Y_train)
    

    或尝试使用

    from sklearn.model_selection import KFold
    df = pd.read_csv('CD.TXT',delimiter=',')
    df.head() 
    X = df[['A', 'B', 'C', 'D']].values
    Y=df['Label'].values
    X=pd.DataFrame(X)
    Y=pd.DataFrame(Y)
    cv = KFold(n_splits=10, random_state=42, shuffle=False)
    for train_index, test_index in cv.split(X):
        print("Train Index: ", train_index, "\n")
        print("Test Index: ", test_index)
    X_train, X_test, Y_train, Y_test = X.iloc[train_index,:], X.iloc[test_index,:], Y.iloc[train_index], Y.iloc[test_index]
    print(X_train)
    print(Y_train)
    

    【讨论】:

      猜你喜欢
      • 2016-01-15
      • 2019-12-18
      • 1970-01-01
      • 2019-01-05
      • 2020-08-29
      • 2018-08-29
      • 2017-06-09
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多