【发布时间】:2014-08-13 08:07:18
【问题描述】:
我只是想确定这与我的代码无关,但它需要在相关的 Python 包中进行修复。 (顺便说一句,这看起来像是我可以在供应商发布更新之前手动修补的东西吗?)我使用的是 scikit-learn-0.15b1,它调用了这些。谢谢!
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Anaconda\lib\multiprocessing\forking.py", line 380, in main
prepare(preparation_data)
File "C:\Anaconda\lib\multiprocessing\forking.py", line 495, in prepare
'__parents_main__', file, path_name, etc
File "H:\Documents\GitHub\health_wealth\code\controls\lasso\scikit_notreat_predictors.py", line 36, in <module>
gs.fit(X_train, y_train)
File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 597, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 379, in _fit
for parameters in parameter_iterable
File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.py", line 604, in __call__
self._pool = MemmapingPool(n_jobs, **poolargs)
File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 559, in __init__
super(MemmapingPool, self).__init__(**poolargs)
File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 400, in __init__
super(PicklingPool, self).__init__(**poolargs)
File "C:\Anaconda\lib\multiprocessing\pool.py", line 159, in __init__
self._repopulate_pool()
File "C:\Anaconda\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
w.start()
File "C:\Anaconda\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Anaconda\lib\multiprocessing\forking.py", line 258, in __init__
cmd = get_command_line() + [rhandle]
File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
is not going to be frozen to produce a Windows executable.''')
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.
更新:这是我编辑的脚本,但它在生成 GridSearchCV 的进程后仍然导致完全相同的错误。实际上,在命令之后相当多的人报告了它会做多少折叠和适合,但除此之外我不知道它什么时候崩溃。我应该把 freeze_support 放在别的地方吗?
import scipy as sp
import numpy as np
import pandas as pd
import multiprocessing as mp
if __name__=='__main__':
mp.freeze_support()
print("Started.")
# n = 10**6
# notreatadapter = iopro.text_adapter('S:/data/controls/notreat.csv', parser='csv')
# X = notreatadapter[1:][0:n]
# y = notreatadapter[0][0:n]
notreatdata = pd.read_stata('S:/data/controls/notreat.dta')
X = notreatdata.iloc[:,1:]
y = notreatdata.iloc[:,0]
n = y.shape[0]
print("Data lodaded.")
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)
print("Data split.")
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train) # Don't cheat - fit only on training data
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test) # apply same transformation to test data
print("Data scaled.")
# build a model
from sklearn.linear_model import SGDClassifier
model = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True)
#model.fit(X,y)
print("CV starts.")
from sklearn import grid_search
# run grid search
param_grid = [{'alpha' : 10.0**-np.arange(1,7),'l1_ratio':[.05, .15, .5, .7, .9, .95, .99, 1]}]
gs = grid_search.GridSearchCV(model,param_grid,n_jobs=8,verbose=1)
gs.fit(X_train, y_train)
print("Scores for alphas:")
print(gs.grid_scores_)
print("Best estimator:")
print(gs.best_estimator_)
print("Best score:")
print(gs.best_score_)
print("Best parameters:")
print(gs.best_params_)
【问题讨论】:
标签: python windows multiprocessing scikit-learn anaconda