【问题标题】:Erlang monitor multiple processesErlang 监控多个进程
【发布时间】:2017-10-30 05:28:56
【问题描述】:

我需要监控一堆工作进程。目前我可以通过 1 个监视器监视 1 个进程。我如何将此扩展到监视 N 个工作进程。我还需要生成 N 个监视器吗?如果是这样,那么如果其中一个生成的监视器失败/崩溃会发生什么?

【问题讨论】:

    标签: process erlang monitor


    【解决方案1】:

    我还需要生成 N 个监视器吗?

    没有:

    -module(mo).
    -compile(export_all).
    
    worker(Id) ->
        timer:sleep(1000 * rand:uniform(5)),
        io:format("Worker~w: I'm still alive~n", [Id]),
        worker(Id).
    
    create_workers(N) ->
        Workers = [  % { {Pid, Ref}, Id }
            { spawn_monitor(?MODULE, worker, [Id]), Id }
            || Id <- lists:seq(1, N)
        ],
        monitor_workers(Workers).
    
    monitor_workers(Workers) ->
        receive
            {'DOWN', Ref, process, Pid, Why} ->
                Worker = {Pid, Ref},
                case is_my_worker(Worker, Workers) of
                    true  ->  
                        NewWorkers = replace_worker(Worker, Workers, Why),
                        io:format("Old Workers:~n~p~n", [Workers]),
                        io:format("New Workers:~n~p~n", [NewWorkers]),
                        monitor_workers(NewWorkers);
                    false -> 
                        monitor_workers(Workers)
                end;
            _Other -> 
                monitor_workers(Workers)
        end.
    
    is_my_worker(Worker, Workers) ->
        lists:keymember(Worker, 1, Workers).
    
    replace_worker(Worker, Workers, Why) ->
        {{Pid, _}, Id} = lists:keyfind(Worker, 1, Workers),
        io:format("Worker~w (~w) went down: ~s~n", [Id, Pid, Why]),
        NewWorkers = lists:keydelete(Worker, 1, Workers),
        NewWorker = spawn_monitor(?MODULE, worker, [Id]),
        [{NewWorker, Id}|NewWorkers].
    
    start() ->
        observer:start(),  %%In the Processes tab, you can right click on a worker and kill it.
        create_workers(4).
    

    在外壳中:

    $ ./run
    Erlang/OTP 19 [erts-8.2] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
    
    Eshell V8.2  (abort with ^G)
    
    
    1> Worker3: I'm still alive
    Worker1: I'm still alive
    Worker2: I'm still alive
    Worker4: I'm still alive
    Worker3: I'm still alive
    Worker1: I'm still alive
    Worker4: I'm still alive
    Worker2: I'm still alive
    Worker3: I'm still alive
    Worker1: I'm still alive
    Worker4: I'm still alive
    Worker3 (<0.87.0>) went down: killed
    Old Workers:
    [{{<0.85.0>,#Ref<0.0.4.292>},1},
     {{<0.86.0>,#Ref<0.0.4.293>},2},
     {{<0.87.0>,#Ref<0.0.4.294>},3},
     {{<0.88.0>,#Ref<0.0.4.295>},4}]
    New Workers:
    [{{<0.2386.0>,#Ref<0.0.1.416>},3},
     {{<0.85.0>,#Ref<0.0.4.292>},1},
     {{<0.86.0>,#Ref<0.0.4.293>},2},
     {{<0.88.0>,#Ref<0.0.4.295>},4}]
    Worker2: I'm still alive
    Worker1: I'm still alive
    Worker2: I'm still alive
    Worker1: I'm still alive
    Worker1: I'm still alive
    Worker4: I'm still alive
    Worker3: I'm still alive
    Worker2: I'm still alive
    Worker1: I'm still alive
    Worker3: I'm still alive
    Worker4: I'm still alive
    Worker1: I'm still alive
    Worker4 (<0.88.0>) went down: killed
    Old Workers:
    [{{<0.2386.0>,#Ref<0.0.1.416>},3},
     {{<0.85.0>,#Ref<0.0.4.292>},1},
     {{<0.86.0>,#Ref<0.0.4.293>},2},
     {{<0.88.0>,#Ref<0.0.4.295>},4}]
    New Workers:
    [{{<0.5322.0>,#Ref<0.0.1.9248>},4},
     {{<0.2386.0>,#Ref<0.0.1.416>},3},
     {{<0.85.0>,#Ref<0.0.4.292>},1},
     {{<0.86.0>,#Ref<0.0.4.293>},2}]
    Worker3: I'm still alive
    Worker2: I'm still alive
    Worker4: I'm still alive
    Worker1: I'm still alive
    Worker3: I'm still alive
    Worker3: I'm still alive
    Worker2: I'm still alive
    Worker1 (<0.85.0>) went down: killed
    Old Workers:
    [{{<0.5322.0>,#Ref<0.0.1.9248>},4},
     {{<0.2386.0>,#Ref<0.0.1.416>},3},
     {{<0.85.0>,#Ref<0.0.4.292>},1},
     {{<0.86.0>,#Ref<0.0.4.293>},2}]
    New Workers:
    [{{<0.5710.0>,#Ref<0.0.1.10430>},1},
     {{<0.5322.0>,#Ref<0.0.1.9248>},4},
     {{<0.2386.0>,#Ref<0.0.1.416>},3},
     {{<0.86.0>,#Ref<0.0.4.293>},2}]
    Worker2: I'm still alive
    Worker3: I'm still alive
    Worker4: I'm still alive
    Worker3: I'm still alive
    

    我认为下面的版本可能效率更高:它使用lists:map()来搜索和替换崩溃的worker,所以它只遍历worker的列表一次:

    -module(mo).
    -compile(export_all).
    
    worker(Id) ->
        timer:sleep(1000 * rand:uniform(5)),
        io:format("Worker~w: I'm still alive~n", [Id]),
        worker(Id).
    
    create_workers(N) ->
        Workers = [  % { {Pid, Ref}, Id }
            { spawn_monitor(?MODULE, worker, [Id]), Id }
            || Id <- lists:seq(1,N)
        ],
        monitor_workers(Workers).
    
    monitor_workers(Workers) ->
        receive
            {'DOWN', Ref, process, Pid, Why} ->
                CrashedWorker = {Pid, Ref},
                NewWorkers = replace(CrashedWorker, Workers, Why),
                io:format("Old Workers:~n~p~n", [Workers]),
                io:format("New Workers:~n~p~n", [NewWorkers]),
                monitor_workers(NewWorkers);
            _Other -> 
                monitor_workers(Workers)
        end.
    
    replace(CrashedWorker, Workers, Why) ->
        lists:map(fun(PidRefId) ->
                          { {Pid,_Ref}=Worker, Id} = PidRefId,
                          case Worker =:= CrashedWorker of
                              true ->  %replace worker
                                  io:format("Worker~w (~w) went down: ~s~n", 
                                            [Id, Pid, Why]),
                                  {spawn_monitor(?MODULE, worker, [Id]), Id}; %=> { {Pid,Ref}, Id }
                              false ->  %leave worker alone
                                  PidRefId  
                          end
                  end,
                  Workers).
    
    start() ->
        observer:start(),  %%In the Processes tab, you can right click on a worker and kill it.
        create_workers(4).
    

    如果是这样,那么如果其中一个生成的监视器发生故障/崩溃会怎样?

    Erlang 在不同国家拥有多个服务器场,并且 erlang 获得了多个冗余电网,因此 elrang 将在一个永不失败的容错分布式系统中重新启动一切。这一切都是内置的。你不必担心任何事情。 :)

    实际上...任何你可以想象的失败的地方,那么它必须被备份,例如。由另一台计算机上的另一个监控进程。

    【讨论】:

      【解决方案2】:

      不要生成然后监视,这过去会导致生产问题,而是使用spawn_monitor

      您可以从您的主管那里启动和监控多个进程,如果您查看monitor 上的文档,您会注意到每次被监控的进程死亡时,它都会发送如下消息:

      {'DOWN', MonitorRef, Type, Object, Info}
      

      到正在监视刚刚死掉的进程的主管进程

      然后你就可以决定做什么了,MonitorRef是你开始监控进程时得到的Reference,Object会有那个进程的Pid死了,如果你给它一个名字,注册的名字。

      使用监视器创建一些示例代码是一个很好的练习,但请尝试坚持使用 OTP 库和 OTP 主管。

      【讨论】:

      • 不要生成然后监控——我已经做到了,监控过程仍然收到“退出”消息——不像link()。跨度>
      • spawn_monitor 出于历史原因而存在,以避免在进程在被监视之前就死掉的错误,而不是经常发生,实际上如果您监视死进程,您将收到一条消息,仍然在学习自己创建主管时,一个好的做法是使用 spawn_monitor 而不是 spawn 然后使用 monitor跨度>
      • 为了避免进程在被监控之前就死掉的错误, -- 据我所知,这是零差异。我明白为什么添加了spawn_link(),但monitor() 似乎没有遇到同样的问题。
      猜你喜欢
      • 2012-01-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-15
      • 1970-01-01
      相关资源
      最近更新 更多