【问题标题】:Rare error while computing dataframe by using pandas使用 pandas 计算数据帧时出现罕见错误
【发布时间】:2018-07-29 04:06:03
【问题描述】:

我在使用 Pandas 加载的数据集进行机器学习时遇到了一个罕见的错误。 这是我得到的错误:

我一直在阅读与它相关的内容,这似乎是由于这些列以及熊猫如何解释它们,但我不知道可能出了什么问题。 这是我为此使用的代码:

import pandas as pd
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/pima-
indians-diabetes/pima-indians-diabetes.data'
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 
'pedigree', 'age', 'label']
pima = pd.read_csv(url, header=None, names=col_names)
# define X and y
feature_cols = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 
'pedigree', 'age']
X = pima[feature_cols]
y = pima.label
#k fold cv
from sklearn.model_selection import KFold, cross_val_score
kf = KFold(n_splits=10) #define number of splits
kf.get_n_splits(X) #to check how many splits will be done.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
clf = LinearDiscriminantAnalysis() #select the model for train, test in kf.split(X, y):
for train, test in kf.split(X, y):
    y_pred_prob = clf.fit(X[train], y[train]).predict_proba(X[test])
    y_pred_class = clf.predict(X[test])

提前致谢

【问题讨论】:

    标签: python-3.x pandas scikit-learn


    【解决方案1】:

    clf.fit方法中,根据文档参数预期为array

    Parameters
    ----------
    X : array-like, shape (n_samples, n_features)
        Training data.
    
    y : array, shape (n_samples,)
    

    如果您查看link 中的示例,Xynumpy array: 尝试将as_matrix() 用于Xy 而不是仅X = pima[feature_cols]y = pima.label

    X = pima[feature_cols].as_matrix()
    y = pima.label.as_matrix()
    

    print 测试:

    import pandas as pd
    
    url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data'
    col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']
    pima = pd.read_csv(url, header=None, names=col_names)
    
    # define X and y
    feature_cols = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age']
    X = pima[feature_cols].as_matrix()
    y = pima.label.as_matrix()
    
    #k fold cv
    from sklearn.model_selection import KFold, cross_val_score
    kf = KFold(n_splits=10) #define number of splits
    kf.get_n_splits(X) #to check how many splits will be done.
    from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
    clf = LinearDiscriminantAnalysis() #select the model for train, test in kf.split(X, y):
    
    for train, test in kf.split(X, y):
        y_pred_prob = clf.fit(X[train], y[train]).predict_proba(X[test])
        y_pred_class = clf.predict(X[test])
        print(y_pred_class)
    

    结果:

    [1 0 1 0 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 1
     0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
     0 0 0]
    [0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0
     1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
     0 1 1]
    [1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0
     0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1
     1 0 1]
    [1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 1
     0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0
     0 1 0]
    [0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0
     0 0 0]
    [0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0
     0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1
     1 0 0]
    [0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 1
     1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
     0 0 0]
    [0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
     0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1
     0 1 0]
    [0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
     0 0 1 0 0 1 0 1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1
     0 1]
    [0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
     0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0
     0 0]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-07-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-08-16
      • 2021-10-17
      • 1970-01-01
      • 2021-12-30
      相关资源
      最近更新 更多