【发布时间】:2013-04-20 21:38:01
【问题描述】:
我在 erlang 中的消息丢失时遇到了一些问题。
当我手动使用时,我使用的代码 100% 正确,只有当我的代码用于“负载测试”时,我与大量请求并行测试代码时,某些消息从未收到接收器部分。在记录所有步骤和参数值时,我发现我发送消息的地址是正确的。消息本身也没有问题。
我的问题如下:在 erlang 中是否知道这种“丢失消息”,这可能是 erlang 本身的一些错误吗?
如果需要,我可以发布一些我正在使用的代码,但我认为它不会特别为这个问题增加很多价值。
更新:我的应用程序的主要部分。这是很多代码来说明我的问题,但我无法在简化版本中重现我的问题。该应用程序是一个分配系统,即它将并行保留网格中的单元集合。重要的部分是: globalManager,一个将控制整个分配系统的参与者。 rowManager 将管理整个网格的一行,并在进行预订时锁定该行。当必须保留一个单元区域时,调用函数 request_specific_cells。此函数将向必须修改行的所有行管理器发送预订请求。当行管理器在其行中保留了区域时,它将向 globalManager 发送确认。当所有的rowmanagers都发送确认后,将向发起请求的进程发送确认,当其中一个managers失败时,globalmanager将发送失败消息。
globalManager(Grid) ->
receive
{Pid, request_specific_cells, ReservationId, Coordinates, Ctr, XX} ->
NewGrid = request_specific_cells(Grid, Pid, ReservationId, Coordinates, Ctr, XX);
{Pid, confirm_region, ResId, Rid, Sid, Region, Section, Ctr, XX} ->
NewGrid = confirm_region(Grid, Pid, ResId, Rid, Sid, Region, Section, Ctr, XX);
{Pid, failed_region, Rid, Region, Ctr, XX} ->
NewGrid = failed_region(Grid, Pid, Rid, Region, Ctr, XX);
Else ->
erlang:display({unexpectedMessage, actor, Else}),
NewGrid = Grid
end,
globalManager(NewGrid).
request_specific_cells(Grid, Pid, ReservationId, Coordinates, Ctr, XX) ->
{{Width, Height}, GridRows, MaxAllocationSize, FreeCells, {UnspecificRequests, NextId}, PendingRequests, BlockedRows} = Grid,
{X, Y, W, H} = Coordinates,
Rows = lists:seq(Y,Y+H-1),
% Is one of the blocks that have to be reserved currently blocked?
BlockedRow = lists:foldl(fun(B, Acc) -> Acc xor search_list(B,BlockedRows) end, false, Rows),
Request = lists:keyfind(ReservationId, 1, UnspecificRequests),
{ReservationId, _} = Request,
% Now we need the addresses of the sections in which the regions has to be reserved.
SubSectionIds = [ SPid || {_,SPid} <- [ lists:keyfind(Row, 1, GridRows) || Row <- Rows]],
% Storing request enables us to rollback if one of the registrations fails.
NewPendingRequests = PendingRequests ++ [{length(PendingRequests), 0, lists:map(fun(S) -> {S,null} end, SubSectionIds)}],
% Send a registration command with the needed section to each corresponding section manager.
[SPid ! {self(), request, Pid, ReservationId, length(PendingRequests), Coordinates, Ctr, XX} || SPid<- SubSectionIds],
NewBlockedRows = Rows ++ BlockedRows,
{{Width, Height}, GridRows, MaxAllocationSize, FreeCells, {UnspecificRequests, NextId}, NewPendingRequests, NewBlockedRows}
end.
confirm_region(Grid, Pid, URid, Rid, Sid, Region, Section, Cttr, XX) ->
{Dimensions, GridRows, MaxAllocationSize, FreeCells, {UnspecificRequests, NextId}, PendingRequests, BlockedRows} = Grid,
{_,RY,_,_} = Region,
if
% All blocks have confirmed the reservation so the entire request is successful
(Ctr+1) == length(Spids) ->
NewUnspecificRequests = lists:keydelete(URid, 1, UnspecificRequests),
NewPendingRequests = lists:keydelete(Rid, 1, PendingRequests),
NewSpids = lists:keyreplace(Sid, 1, Spids, {Sid, Section}),
[Spid ! {self(), confirm_region, Sec} || {Spid, Sec} <- NewSpids],
Pid ! {self(), request_specific_cells, Rid, success};
true ->
NewUnspecificRequests = UnspecificRequests,
% Safe the region that has to be marked/rolled back in the row
NewSpids = lists:keyreplace(Sid, 1, Spids, {Sid, Section}),
% Increase counter of confirmations
NewPendingRequests = lists:keyreplace(Rid, 1, PendingRequests, {Rid, Ctr+1, NewSpids})
end,
NewBlockedRows = delete_list(RY, BlockedRows)
{Dimensions, GridRows, MaxAllocationSize, FreeCells, {NewUnspecificRequests, NextId}, NewPendingRequests, NewBlockedRows}.
rowManager(Row) ->
receive
{Mid, request, Pid, URid, Rid, Region, Ctr, XX} ->
NewRow = request_region(Row, Mid, Pid, URid, Rid, Region, Ctr, XX);
Else ->
erlang:display({unexpectedMessage, rowManager, Else}),
NewRow = Row
end,
rowManager(NewRow).
request_region(Row, Mid, Pid, URid, Rid, Coordinates, Ctr, XX) ->
{RY, Content, Modified} = Row,
{X,_,W,_} = Coordinates,
if
Modified == false ->
Free = region_is_empty({X,1,W,1}, Content),
if
Free -> NewModified = true,
NewContent = mark_region({X,1,W,1}, Content, reserved),
Mid ! {Pid, confirm_region, URid, Rid, self(), Coordinates, {X,1,W,1}, Ctr, XX};
true -> NewModified = false,
NewContent = Content,
Mid ! {Pid, failed_region, Rid, Coordinates, Ctr, XX}
end;
true -> NewModified = false,
NewContent = Content,
Mid ! {Pid, failed_region, Rid, Coordinates, Ctr, XX}
end,
{RY, NewContent, NewModified}.
此代码将被保留者使用:
request_specific_cells(FollowUpPid, ReservationId, {X, Y, Width, Height}, Ctr, XX) ->
FollowUpPid ! {self(), request_specific_cells, ReservationId, {X, Y, Width, Height}, Ctr, XX},
receive
{FollowUpPid, request_specific_cells, ReservationId, SuccessOrFailure} ->
SuccessOrFailure
end.
我认为这个接收者在收到答案之前就死了,因为我知道
Pid ! {self(), request_specific_cells, Rid, success}
confirm/9 函数总是使用正确的值执行,但并不总是在函数处接收到。
【问题讨论】:
-
我没有听说过
lost messages。但是,我们可以使用一些代码并尝试找出原因。但是,这些消息可能不是按照您期望的顺序到达的。此外,大多数情况下,如果接收者死了,消息就会丢失。 -
消息不会在一个节点内丢失!发送的消息到达。您是否有任何过于笼统的接收模式可以在“错误”的地方接收这些消息?
-
@MuzaayaJoshua 因为消息和地址都是正确的,我认为我的问题的原因是接收者的死亡。为什么接收者会在收到消息之前死掉?我怎样才能防止这种情况发生?我将尝试提供一些我的原始代码,希望能提供有关我的问题的更多信息。
-
有时,接收者会因为链接而死亡。当您将
spawn_link或link与立即退出的进程一起使用并且这些回收不是process_flag(trap_exit,true)等时会发生这种情况 -
@rvirding 我程序中的所有其他接收器都有一个通用接收模式,其中将打印错误并显示消息。由于我从来没有看到任何错误显示,我假设消息没有发送到错误的接收者。