【问题标题】:"RuntimeError: No default context installed. " when using Tensorflow Federated“运行时错误:未安装默认上下文。”使用 Tensorflow Federated 时
【发布时间】:2021-10-12 14:17:52
【问题描述】:

目前我正在使用 TensorFlow Federated 开展联合学习项目。 当我收到此错误时,我正在从服务器发出请求以检查我的代码是否正常工作:

    RuntimeError: No default context installed.
    
    You should not expect to get this error using the TFF API.

不过,我只是在某些特定条件下才会遇到。

场景是这样的(所有代码如下):

从网站发出 http 请求。 routes/developers.py 中的函数 upload_and_train 处理请求。在此内部,调用 start_processing 函数开始训练预处理(收集训练数据、初始化超参数等)。最后,federated_computation_new 函数被调用(这也是它崩溃的地方),它开始了联邦学习。 它在到达调用时崩溃:iterative_process.initialize()

iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))
state = iterative_process.initialize()

令人困惑的部分如下。如果我在本地运行代码,一切顺利,训练过程正常;没有错误。如果我在服务器上运行它,它也适用于发出的第一个请求。之后它崩溃并在所有以下请求中返回相同的错误(在下面详细说明),直到我重新启动服务器。然后它再次在第一次调用时完美运行,并在后续调用中继续崩溃。

这个问题把我逼疯了,我想不通。我唯一剩下的想法是在第一次调用之后发生了一些事情(一个进程没有关闭或类似的东西)并且在随后的调用中它没有得到一个“新”的开始?虽然它一开始就不应该发生。

完整的错误信息如下:

    143.205.173.225 - - [12/Oct/2021 13:18:05] "[35m[1mPOST /api/Developers/use_cases/text_processing/developer_id/3/upload_and_train HTTP/1.1[0m" 500 -
INFO:werkzeug:143.205.173.225 - - [12/Oct/2021 13:18:05] "[35m[1mPOST /api/Developers/use_cases/text_processing/developer_id/3/upload_and_train HTTP/1.1[0m" 500 -
 doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
ERROR:main:Exception on /api/Developers/use_cases/text_processing/developer_id/4/upload_and_train [POST]
Traceback (most recent call last):
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/decorator.py", line 48, in wrapper
    response = function(request)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/uri_parsing.py", line 144, in wrapper
    response = function(request)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/validation.py", line 384, in wrapper
    return function(request)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/parameter.py", line 121, in wrapper
    return function(**kwargs)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/routes/developers.py", line 46, in upload_and_train
    last_train_metrics = main_proc.start_processing(use_case,developer_id)
  File "processing/text_processing/main_proc.py", line 17, in start_processing
    state,metrics = federated_computation_new(train_dataset,test_dataset)
  File "processing/text_processing/federated_algorithm.py", line 29, in federated_computation_new
    state = iterative_process.initialize()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/utils/function_utils.py", line 521, in __call__
    return context.invoke(self, arg)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/context_stack/runtime_error_context.py", line 41, in invoke
    self._raise_runtime_error()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/context_stack/runtime_error_context.py", line 23, in _raise_runtime_error
    raise RuntimeError(
RuntimeError: No default context installed.

You should not expect to get this error using the TFF API.

If you are getting this error when testing a module inside of `tensorflow_federated/python/core/...`, you may need to explicitly invoke `execution_contexts.set_local_execution_context()` in the `main` function of your test.

处理传入请求的第一个函数。 该请求包含 4 个参数:2 个标识符“use_case”和“developer_”id”以及 2 个包含训练数据的 formData 文件,这些文件存储在本地。

def upload_and_train(use_case: str, developer_id: int):


    use_case_path = 'processing/'+use_case+'/'
    sys.path.append(use_case_path)
    import main_proc

    app_path = dirname(dirname(abspath(__file__)))
    file_dict = request.files
    db_File_True = file_dict["dataset_file1"]
    db_File_Fake = file_dict["dataset_file2"]
    true_csv_path = os.path.join(app_path+"/"+use_case_path+"db/", "True.csv")
    fake_csv_path = os.path.join(app_path+"/"+use_case_path+"db/", "Fake.csv")
    db_File_True.save(true_csv_path)
    db_File_Fake.save(fake_csv_path)
    time.sleep(5) #wait for the files to be copied before proceeding
    #THEN start processing
    last_train_metrics = main_proc.start_processing(use_case,developer_id) # <============== GOES INTO HERE & CRASHES
    metricsJson = trainMetricsToJSON(last_train_metrics)    

    return Response(status=200, response=metricsJson)

开始预处理的函数:

def start_processing(use_case, developer_id:int = 0):
    globals.initialize(use_case,developer_id)
    globals.TRAINER_ID = developer_id
    
    
    train_dataset, test_dataset= get_preprocessed_train_test_data()

    state,metrics = federated_computation_new(train_dataset,test_dataset) # <============== GOES INTO HERE & CRASHES  
    trained_metrics= metrics['train']
    
    timestamp = int(time.time())
    globals.DATASET_ID = timestamp
    
    written_row = save_to_file_CSV(use_case,globals.TRAINER_ID,timestamp,globals.DATASET_ID,trained_metrics['sparse_categorical_accuracy'],trained_metrics['loss'])
    return written_row

正在进行联合训练的函数:

def federated_computation_new(train_dataset,test_dataset):

    # Training and evaluating the model
    iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))
    state = iterative_process.initialize() # <============== CRASHES HERE

    print(type(state))

    for n in range(globals.EPOCHS):
        state, metrics = iterative_process.next(state, train_dataset)
        print('round  {}, training metrics={}'.format(n+1, metrics))

    evaluation = tff.learning.build_federated_evaluation(model_fn)
    eval_metrics = evaluation(state.model, train_dataset)
    print('Training evaluation metrics={}'.format(eval_metrics))

    test_metrics = evaluation(state.model, test_dataset)
    print('Test evaluation metrics={}'.format(test_metrics))
    #############################################################################################
    #Save Last Trained Model
    import pickle
    with open("processing/"+globals.USE_CASE+"/last_model",'wb') as f:
        pickle.dump(state, f)
    return state,metrics
def model_fn():
  keras_model = get_simple_LSTM_model()

  return tff.learning.from_keras_model(
      keras_model,
      input_spec=globals.INPUT_SPEC,
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

函数:/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/utils/function_utils.py ",第 521 行,

def __call__(self, *args, **kwargs):
    context = self._context_stack.current
    arg = pack_args(self._type_signature.parameter, args, kwargs, context)
    return context.invoke(self, arg) # <============== This returns the runtime Error

非常感谢您的时间和耐心。

【问题讨论】:

    标签: python tensorflow runtime-error tensorflow-federated


    【解决方案1】:

    我认为我们可以指出“应该”防止这种情况发生的机制,并给出一个解决方法——但至于诊断这里的根本原因,目前我只有猜测。

    当您运行 import tensorflow_federated as tff 时,this line 应该执行,将执行上下文安装在全局上下文堆栈的基础上,TFF 使用该堆栈来管理 __call__ 的含义。正是这个上下文堆栈由__call__function_utils.py 中的实现委派。

    在此行执行之前,堆栈底部安装了一个“默认”RuntimeErrorContext,当有人尝试invoke 任何违反此上下文的内容时,它就会抛出(就此而言,ingest 将某些东西放入此上下文也会引发,但您无法调用无参数计算,因此无需提取参数)。

    所以我认为这里的一种可能性是这段代码没有运行 TFF 用来安装上下文的 __init__.py 文件。从代码 sn-ps 对我来说并不明显,但我想它可能是可能的。

    当我们尝试进一步诊断此问题时,我们可以为您提供合理的解决方法。如果在 federated_computation_new 函数中调用 tff.backends.native.set_local_python_execution_context()(或 set_local_execution_context,取决于您的 TFF 版本),此错误应该会自行解决。

    【讨论】:

    • 感谢您的回复。就我而言,它是tff.backends.native.set_local_python_execution_context。但是它仍然返回相同的错误。我在state = iterative_process.initiazlize() 之前调用了它并检查了它,它被执行了。但是它并不能解决任何问题,因为错误模式是相同的。
    猜你喜欢
    • 2020-07-09
    • 1970-01-01
    • 2022-07-20
    • 2018-05-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-07-05
    • 1970-01-01
    相关资源
    最近更新 更多