【问题标题】:Simple Linear Regression using CSV data file Sklearn使用 CSV 数据文件 Sklearn 的简单线性回归
【发布时间】:2019-01-19 18:02:39
【问题描述】:

过去几天我一直在尝试这个,但不是运气。 我想做的是使用 sklearn 进行简单的线性回归拟合和预测,但我无法让数据与模型一起使用。我知道我没有正确地重塑我的数据,我只是不知道该怎么做。
对此的任何帮助将不胜感激。我最近收到此错误发现样本数量不一致的输入变量:[1, 9] 这似乎意味着 Y 有 9 个值,而 X 只有 1。我认为这应该是另一种方式,但是当我打印出 X 时,它给了我 CSV 文件中的一行,但 y 给了我所有CSV 文件中的行。对此的任何帮助将不胜感激。

这是我的代码。

filename = "E:/TestPythonCode/animalData.csv"

#Data set Preprocess data
dataframe = pd.read_csv(filename, dtype = 'category')
print(dataframe.head())
#Git rid of the name of the animal
#And change the hunter/scavenger to 0/1
dataframe = dataframe.drop(["Name"], axis = 1)
cleanup = {"Class": {"Primary Hunter" : 0, "Primary Scavenger": 1     }}
dataframe.replace(cleanup, inplace = True)
print(dataframe.head())
#array = dataframe.values
#Data splt
# Seperating the data into dependent and independent variables
X = dataframe.iloc[-1:]
y = dataframe.iloc[:,-1]
print(X)
print(y)

logReg = LogisticRegression()

#logReg.fit(X,y)
logReg.fit(X[:None],y)
#logReg.fit(dataframe.iloc[-1:],dataframe.iloc[:,-1])

这是 csv 文件

Name,teethLength,weight,length,hieght,speed,Calorie Intake,Bite Force,Prey Speed,PreySize,EyeSight,Smell,Class
T-Rex,12,15432,40,20,33,40000,12800,20,19841,0,0,Primary Hunter
Crocodile,4,2400,23,1.6,8,2500,3700,30,881,0,0,Primary Hunter
Lion,2.7,416,9.8,3.9,50,7236,650,35,1300,0,0,Primary Hunter
Bear,3.6,600,7,3.35,40,20000,975,0,0,0,0,Primary Scavenger
Tiger,3,260,12,3,40,7236,1050,37,160,0,0,Primary Hunter
Hyena,0.27,160,5,2,37,5000,1100,20,40,0,0,Primary Scavenger
Jaguar,2,220,5.5,2.5,40,5000,1350,15,300,0,0,Primary Hunter
Cheetah,1.5,154,4.9,2.9,70,2200,475,56,185,0,0,Primary Hunter
KomodoDragon,0.4,150,8.5,1,13,1994,240,24,110,0,0,Primary Scavenger

【问题讨论】:

  • 应该是X = dataframe.iloc[:, :-1]
  • 这样做只会给我一个类,这就是我用于标签 0 是猎人和 1 是清道夫

标签: python pandas numpy scikit-learn


【解决方案1】:

使用:

X = dataframe.iloc[:,0:-1]

y = dataframe.iloc[:,-1]

【讨论】:

    【解决方案2】:

    您需要对名称进行标签编码。

    txt="""霸王龙,12,15432,40,20,33,40000,12800,20,19841,0,0,初级猎人 鳄鱼,4,2400,23,1.6,8,2500,3700,30,881,0,0,初级猎人 狮子,2.7,416,9.8,3.9,50,7236,650,35,1300,0,0,初级猎人 熊,3.6,600,7,3.35,40,20000,975,0,0,0,0,主要清道夫 老虎,3,260,12,3,40,7236,1050,37,160,0,0,主猎人 鬣狗,0.27,160,5,2,37,5000,1100,20,40,0,0,初级清道夫 捷豹,2,220,5.5,2.5,40,5000,1350,15,300,0,0,主猎人 猎豹,1.5,154,4.9,2.9,70,2200,475,56,185,0,0,初级猎人 KomodoDragon,0.4,150,8.5,1,13,1994,240,24,110,0,0,Primary Scavenger"""

    from sklearn.linear_model import LogisticRegression
    from sklearn.preprocessing import StandardScaler
    from io import StringIO
    from sklearn.preprocessing import LabelEncoder
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.metrics import roc_curve
    from sklearn.metrics import confusion_matrix
    
    f = StringIO(txt)
    df = pd.read_table(f,sep =',')
    df.columns=['Name','TeethLength','Weight','Length','Height','Speed','Calorie Intake','Bite Force','Prey Speed','PreySize','EyeSight','Smell','Class']
    
    transform_dict = {"Class": {"Primary Hunter" : 0, "Primary Scavenger": 1     }}
    df.replace(transform_dict, inplace = True)
    
    encoder=LabelEncoder()
    
    COLUMNS=[column for column in df.columns if column not in ['Class']]
    
    X = df[COLUMNS]
    y = df.iloc[:,-1]
    X['Name_enc']=encoder.fit_transform(X['Name'])
    X=X.drop(['Name'],axis=1)
    
    logReg = LogisticRegression()
    
    scaler=StandardScaler()
    X=scaler.fit_transform(X)
    
    logReg.fit(X,y)
    
    y_pred_prob=logReg.predict_proba(X)
    
    predictions=logReg.predict(X)
    
    sns.countplot(x=predictions, orient='h')
    plt.show()
    
    fpr, tpr, threshholds = roc_curve(y,y_pred_prob[:,1])
    
    plt.plot([0, 1], [0, 1], 'k--')
    plt.plot(fpr, tpr)
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC Curve')
    plt.show()
    
    cm=confusion_matrix(y,predictions)
    sns.heatmap(cm,annot=True,fmt='g')
    

    【讨论】:

      猜你喜欢
      • 2017-12-01
      • 2020-07-27
      • 2014-05-05
      • 2013-03-15
      • 2018-07-23
      • 2018-05-16
      • 2019-03-26
      • 2018-12-13
      • 2020-08-06
      相关资源
      最近更新 更多