【问题标题】:Why does the score of TPOT recommend classifier is lower than LinearSVC?为什么TPOT推荐分类器的分数低于LinearSVC?
【发布时间】:2017-08-06 06:03:11
【问题描述】:

所以我发现 LinearSVC 在 TPOT 分类器中,我一直在将它用于我的模型并获得了相当不错的分数(sklearn 分数为 0.95)。

def process(stock):
  df = format_data(stock)
  df[['HSI Volume', 'HSI', stock]] = df[['HSI Volume', 'HSI', stock]].pct_change()

# shift future value to current date
  df[stock+'_future'] = df[stock].shift(-1)
  df.replace([-np.inf, np.inf], np.nan, inplace=True)
  df.dropna(inplace=True)
  df['class'] = list(map(create_labels, df[stock], df[stock+'_future']))
  X = np.array(df.drop(['class', stock+'_future'], 1)) # 1 = column
  # X = preprocessing.scale(X)
  y = np.array(df['class'])

  X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)

  tpot = TPOTClassifier(generations = 10, verbosity=2)
  fitting = tpot.fit(X_train, y_train)
  prediction = tpot.score(X_test, y_test)
  tpot.export('pipeline.py')
  return fitting, prediction

十代后:TPOT 推荐 GaussianNB,sklearn 分数在 0.77 左右。

Generation 1 - Current best internal CV score: 0.5322255571                     
Generation 2 - Current best internal CV score: 0.55453535828                    
Generation 3 - Current best internal CV score: 0.55453535828                    
Generation 4 - Current best internal CV score: 0.55453535828                    
Generation 5 - Current best internal CV score: 0.587469903893                   
Generation 6 - Current best internal CV score: 0.587469903893                   
Generation 7 - Current best internal CV score: 0.597194474469                   
Generation 8 - Current best internal CV score: 0.597194474469                   
Generation 9 - Current best internal CV score: 0.597194474469                   
Generation 10 - Current best internal CV score: 0.597194474469                  

Best pipeline: GaussianNB(RBFSampler(input_matrix, 0.22))
(None, 0.54637855142056824)

我只是很好奇为什么 LinearSVC 得分更高但 TPOT 不推荐。是不是因为评分机制不同导致最优分类器不同?

非常感谢!

【问题讨论】:

  • 原因是生成时间不够长,分类器欠拟合数据。

标签: python machine-learning svm genetic-algorithm genetic-programming


【解决方案1】:

我个人的猜测是 tpot 停留在局部最大值上,也许尝试更改测试大小、进行更多代或缩放数据可能会有所帮助。另外,您能否重做 TPOT 并查看是否得到相同的结果? (我的猜测是否定的,因为基因优化由于突变是非确定性的)

【讨论】:

  • 你说得对,我通过在TPOT中添加更多代来解决问题,并达到更高的准确度。
猜你喜欢
  • 2017-02-01
  • 2014-01-04
  • 2019-06-11
  • 2018-09-24
  • 2021-07-27
  • 1970-01-01
  • 2018-07-15
  • 2017-02-04
  • 1970-01-01
相关资源
最近更新 更多