【问题标题】:pyGAM `y data is not in domain of logit link function`pyGAM`y 数据不在 logit 链接函数的域中`
【发布时间】:2020-03-06 20:33:38
【问题描述】:

我试图找出葡萄酒数据集的化学特性在多大程度上影响数据集的质量特性。

错误:

ValueError: y 数据不在 logit 链接函数的域中。预期的 域:[0.0, 1.0],但找到 [3.0, 9.0]

代码:

import pandas as pd

from pygam import LogisticGAM

white_data = pd.read_csv("winequality-white.csv",sep=';');

X = white_data[[
    "fixed acidity","volatile acidity","citric acid","residual sugar","chlorides","free sulfur dioxide",
    "total sulfur dioxide","density","pH","sulphates","alcohol"
]]

print(X.describe)

y = pd.Series(white_data["quality"]);

print(white_quality.describe)

white_gam = LogisticGAM().fit(X, y)

上述代码的输出:

<bound method NDFrame.describe of       fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0               7.0              0.27         0.36            20.7      0.045   
1               6.3              0.30         0.34             1.6      0.049   
2               8.1              0.28         0.40             6.9      0.050   
3               7.2              0.23         0.32             8.5      0.058   
4               7.2              0.23         0.32             8.5      0.058   
...             ...               ...          ...             ...        ...   
4893            6.2              0.21         0.29             1.6      0.039   
4894            6.6              0.32         0.36             8.0      0.047   
4895            6.5              0.24         0.19             1.2      0.041   
4896            5.5              0.29         0.30             1.1      0.022   
4897            6.0              0.21         0.38             0.8      0.020   

      free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                    45.0                 170.0  1.00100  3.00       0.45   
1                    14.0                 132.0  0.99400  3.30       0.49   
2                    30.0                  97.0  0.99510  3.26       0.44   
3                    47.0                 186.0  0.99560  3.19       0.40   
4                    47.0                 186.0  0.99560  3.19       0.40   
...                   ...                   ...      ...   ...        ...   
4893                 24.0                  92.0  0.99114  3.27       0.50   
4894                 57.0                 168.0  0.99490  3.15       0.46   
4895                 30.0                 111.0  0.99254  2.99       0.46   
4896                 20.0                 110.0  0.98869  3.34       0.38   
4897                 22.0                  98.0  0.98941  3.26       0.32   

      alcohol  
0         8.8  
1         9.5  
2        10.1  
3         9.9  
4         9.9  
...       ...  
4893     11.2  
4894      9.6  
4895      9.4  
4896     12.8  
4897     11.8  

[4898 rows x 11 columns]>
<bound method NDFrame.describe of 0       6
1       6
2       6
3       6
4       6
       ..
4893    6
4894    5
4895    6
4896    7
4897    6
Name: quality, Length: 4898, dtype: int64>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-71-e1c5720823a6> in <module>
     16 print(white_quality.describe)
     17 
---> 18 white_gam = LogisticGAM().fit(X, y)

~/miniconda3/lib/python3.7/site-packages/pygam/pygam.py in fit(self, X, y, weights)
    893 
    894         # validate data
--> 895         y = check_y(y, self.link, self.distribution, verbose=self.verbose)
    896         X = check_X(X, verbose=self.verbose)
    897         check_X_y(X, y)

~/miniconda3/lib/python3.7/site-packages/pygam/utils.py in check_y(y, link, dist, min_samples, verbose)
    227                              .format(link, get_link_domain(link, dist),
    228                                      [float('%.2f'%np.min(y)),
--> 229                                       float('%.2f'%np.max(y))]))
    230     return y
    231 

ValueError: y data is not in domain of logit link function. Expected domain: [0.0, 1.0], but found [3.0, 9.0]

文件:(我正在使用 Jupyter Notebook,但我认为您不需要):https://drive.google.com/drive/folders/1RAj2Gh6WfdzpwtgbMaFVuvBVIWwoTUW5?usp=sharing

【问题讨论】:

    标签: python pygam


    【解决方案1】:

    您可能想使用LinearGAM - LogisticGAM 用于分类任务。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-02-26
      • 2017-01-14
      • 2018-11-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多